At GCB Hub, we enhance the understanding of biomarkers and their causal connections with complex traits and diseases across global populations.
Proteome-wide association studies (PWAS) decode the intricate proteomic landscape of biological mechanisms for complex diseases. Traditional PWAS model training relies heavily on individual-level reference proteomes, thereby restricting its capacity to harness the emerging summary-level protein quantitative trait loci (pQTL) data in the public domain. Here we introduced a novel framework to train PWAS models directly from pQTL summary statistics. By leveraging extensive pQTL data from the UK Biobank, deCODE, and ARIC studies, we applied our approach to train large-scale European PWAS models (total n = 88,838 subjects). Furthermore, we developed PWAS models tailored for Asian and African ancestries by integrating multi-ancestry summary and individual-level data resources (total n = 914 for Asian and 3,042 for African ancestries). We validated the performance of our PWAS models through a systematic multi-ancestry analysis of over 700 phenotypes across five major genetic data resources. Our results bridge the gap between genomics and proteomics for drug discovery, highlighting novel protein-phenotype links and their transferability across diverse ancestries.
Note:
BLISS code and developed models can be downloaded from our data depository. For the most current BLISS code, please refer to our GitHub page.
In each GWAS database, we applied the PWAS models with matched ancestry and only analyzed proteins with cis-heritability exceeding 0.01. All eight possible models are:
Name | Platform | Method | Ancestry | Training Sample size | # proteins |
---|---|---|---|---|---|
UKB | OLink | BLISS | EUR | 46,066 | 1,412 |
ARIC | SomaScan | BLISS | EUR | 7,213 | 4,423 |
deCODE | SomaScan | BLISS | EUR | 35,559 | 4,428 |
UKB_AFR_std | OLink | Standard PWAS | AFR | 1,171 | 1,412 |
UKB_AFR_super | OLink | BLISS (Super Learner) | AFR | 1,171 | 1,412 |
ARIC_AA | SomaScan | BLISS | African American | 1,871 | 4,415 |
UKB_ASN_std | OLink | Standard PWAS | Asian | 914 | 1,412 |
UKB_ASN_super | OLink | BLISS (Super Learner) | Asian | 914 | 1,412 |
We used SNP name (rsid) to link different datasets and all positions are on GRCh37.
This is a comprehensive multi-ancestry methylome-wide association study (MWAS) conducted on purified monocytes from European American (EA) and African American (AA) populations.
The data presented here are valuable for researchers investigating the epigenetic basis of complex diseases, particularly those mediated by the immune system.
GCB Hub aims to assemble a passionate team dedicated to open science, pulling together skills from a wide range of disciplines, including statistics, biostatistics, causal inference, epidemiology, drug development, clinical medicine, human genomics, as well as web and software development.
Leadership Team:
Chong Wu (MD Anderson) and Bingxin Zhao (UPenn)
Current and past members:
Zichen Zhang (MD Anderson), Xiaochen Yang (Purdue), Wanheng Zhang (MD Anderson)
We are dedicated to identifying causal biomarkers, including but not limited to proteins, genes (expression and splicing), and CpG sites, for complex traits and diseases in different domains. Resulting resources would enable to address many relevant scientific questions, and help researchers in pinpointing the most promising targets for subsequent functional analysis, drug development, and repurposing.
If you have QTL datasets or would like to deposit your summary statistics to GCB Hub, please feel free to reach out to Chong Wu and Bingxin Zhao.
This work is licensed under the CC BY-NC-ND 4.0 DEED.