Postdoc, Université Paris Cité.Entreprise/Organisme : | UR 7537 — BioSTM Biostatistique, Traitement et Modélisation des données biologiques | Niveau d'études : | Doctorat | Date de début : | September 2022 | Durée du contrat : | 2 years | Rémunération : | According to experience | Description : | Keywords: statistical genetics; pleiotropy; complex traits and diseases; GWAS; post-genomic data; causal inference; Mendelian randomization; Gaussian mixture models; unsupervised and semi-supervised learning
One striking observation today in the field of human genetics is that as Research advances to
understand the genetic architecture of complex traits and to apprehend the etiology of heritable
diseases, new paradigms keep emerging revealing more and more of the complexity of biological
models. Indeed, the human genome is composed of about 20,000 genes if we consider the coding
parts of the DNA, which is hardly more than the worm Caenorhabditis elegans for example. Thus,
the complexity of the human organism, i.e. the great diversity of cell types and functions of the
organism, must result rather from very high combinatorics and fine-tuned regulations of the
expression of these genes. Therefore, mechanically, each genetic element (e.g. variant, gene) is
expected to influence several traits. This phenomenon is called pleiotropy.
Although pleiotropy is extremely common and thought to play a central role in the genetic
architecture of human complex traits and diseases, it is one of the least understood
phenomena.
One of the most compelling lines of evidence supporting pleiotropy is provided by Genome-wide
associations studies (GWASs) which consist in estimating the effect of genome-wide genetic
variants on a studied trait. GWASs have yielded to the identification of countless genetic variants
significantly associated with many complex traits and diseases, most certainly because of
pleiotropy, and without being able to pinpoint a causal mechanism in the vast majority of cases.
Therefore, many applications and method development have successfully reused the results of
GWASs principally to study relationships between traits. One booming field using GWASs
summary statistics data is causal inference between traits in the form of Mendelian
randomization. The principle of Mendelian randomization is very simple and analogous to
randomized control trials where the effects of variant alleles (instead of drug/placebo) are
modeled through regression to estimate and test the causal effect of an exposure trait on an
outcome trait. Although extremely appealing, Mendelian randomization relies on a strong
assumption: the absence of horizontal pleiotropy occurring when a variant has independent
effects on both the exposure and the outcome. Pleiotropy tended to be neglected in Mendelian
randomization applications. In a stepping-stone paper published in Nature Genetics in 2018, we
have shown that horizontal pleiotropy cannot be neglected and occurs in almost 50% of causal
relationships, biasing causal estimates and inflating the false discovery rate of causal
relationships.
On a related topic, in 2019, we have published a proof-of-concept paper in Genome Biology to,
not only detect horizontal pleiotropy, but to show that pleiotropy can be quantified at the level
of the genetic variants themselves. We have shown that pleiotropy is widespread across the
human genome.
Today we intend to go further, we have conceptualized 5 biological mechanisms leading to
pleiotropy 1) linkage disequilibrium; 2) causality between traits; 3) genetic correlation between
traits; 4) high polygenicity of traits; 5) horizontal pleiotropy (true independent effects of a variant
on two traits). We propose to build a comprehensive framework to disentangle all 5 states of
pleiotropy and provide a genome-wide map of pleiotropy for genetic variants and to infer causal
relationships between traits using machine learning. Specifically, we propose 1) to improve on a
method that we have published the proof-of-concept paper using unsupervised approaches
based on penalized methods, random forests or deep learning; 2) to explore semi-supervised
learning using a creative strategy to label data that we have developed. There is a growing utility
for Human genetic variant databases, from the interpretation of genetic analyses to clinical
interpretation. We strongly believe that a database describing the pleiotropic nature of variants
will complement existing databases and serve the community. Importantly, the full code of the
produced methodology and the genome-wide map of pleiotropy will be made publicly available
and highlighted in scientific publications. | En savoir plus : | http://marie.verbanck.free.fr/PostdocPosition.pdf | Contact : | marie.verbanck@u-paris.fr |
|