About

At GCB Hub, we enhance the understanding of biomarkers and their causal connections with complex traits and diseases across global populations.

Science

Large-scale imputation models for multi-ancestry proteome-wide association analysis.

Proteome-wide association studies (PWAS) decode the intricate proteomic landscape of biological mechanisms for complex diseases. Traditional PWAS model training relies heavily on individual-level reference proteomes, thereby restricting its capacity to harness the emerging summary-level protein quantitative trait loci (pQTL) data in the public domain. Here we introduced a novel framework to train PWAS models directly from pQTL summary statistics. By leveraging extensive pQTL data from the UK Biobank, deCODE, and ARIC studies, we applied our approach to train large-scale European PWAS models (total n = 88,838 subjects). Furthermore, we developed PWAS models tailored for Asian and African ancestries by integrating multi-ancestry summary and individual-level data resources (total n = 914 for Asian and 3,042 for African ancestries). We validated the performance of our PWAS models through a systematic multi-ancestry analysis of over 700 phenotypes across five major genetic data resources. Our results bridge the gap between genomics and proteomics for drug discovery, highlighting novel protein-phenotype links and their transferability across diverse ancestries.

BLISS illustration

Note:

  1. BLISS code and developed models can be downloaded from our data depository. For the most current BLISS code, please refer to our GitHub page.

  2. In each GWAS database, we applied the PWAS models with matched ancestry and only analyzed proteins with cis-heritability exceeding 0.01. All eight possible models are:

    Name Platform Method Ancestry Training Sample size # proteins
    UKB OLink BLISS EUR 46,066 1,412
    ARIC SomaScan BLISS EUR 7,213 4,423
    deCODE SomaScan BLISS EUR 35,559 4,428
    UKB_AFR_std OLink Standard PWAS AFR 1,171 1,412
    UKB_AFR_super OLink BLISS (Super Learner) AFR 1,171 1,412
    ARIC_AA SomaScan BLISS African American 1,871 4,415
    UKB_ASN_std OLink Standard PWAS Asian 914 1,412
    UKB_ASN_super OLink BLISS (Super Learner) Asian 914 1,412
  3. We used SNP name (rsid) to link different datasets and all positions are on GRCh37.


An atlas of genetic effects on the monocyte methylome across European and African populations.

Comprehensive Multi-Ancestry Methylome-Wide Association Study (MWAS)

This is a comprehensive multi-ancestry methylome-wide association study (MWAS) conducted on purified monocytes from European American (EA) and African American (AA) populations.

Key Features:

  • Whole-genome bisulfite sequencing (WGBS) data from 298 EA and 160 AA individuals
  • Analysis of over 25 million methylation sites
  • Identification of cis- and trans-methylation quantitative trait loci (meQTLs)
  • Development of population-specific DNA methylation imputation models
  • MWAS analysis of 41 complex traits using Million Veteran Program (MVP) data

Our study provides:

  • CpG-trait associations: Direct links between specific methylation sites and complex traits
  • Gene-trait associations: Aggregated effects of methylation on genes associated with various phenotypes

This resource bridges the gap between genomics and the monocyte methylome, offering insights into:

  • Genetic regulation of DNA methylation
  • Novel methylation-phenotype associations
  • Transferability of findings across diverse ancestries

The data presented here are valuable for researchers investigating the epigenetic basis of complex diseases, particularly those mediated by the immune system.

MWAS illustration

Team

GCB Hub aims to assemble a passionate team dedicated to open science, pulling together skills from a wide range of disciplines, including statistics, biostatistics, causal inference, epidemiology, drug development, clinical medicine, human genomics, as well as web and software development.

Leadership Team:

Chong Wu (MD Anderson) and Bingxin Zhao (UPenn)

Current and past members:

Zichen Zhang orcid logo 16px (MD Anderson), Xiaochen Yang orcid logo 16px (Purdue), Wanheng Zhang orcid logo 16px (MD Anderson)

Contact and license

We are dedicated to identifying causal biomarkers, including but not limited to proteins, genes (expression and splicing), and CpG sites, for complex traits and diseases in different domains. Resulting resources would enable to address many relevant scientific questions, and help researchers in pinpointing the most promising targets for subsequent functional analysis, drug development, and repurposing.

If you have QTL datasets or would like to deposit your summary statistics to GCB Hub, please feel free to reach out to Chong Wu and Bingxin Zhao.

This work is licensed under the CC BY-NC-ND 4.0 DEED.

Map of visitors