Stefania Benonisdottir, lead author of the study and a Doctoral candidate from the Big Data Institute, explains, ‘Currently, most genetic studies are based on genetic databases which contain large numbers of participants and a wealth of information. However, some people are more likely to be included in these databases than others, which can create a problem called ascertainment bias, where the genetic data collected is not representative of the intended study population.’
Most genetic studies are based on genetic databases which contain large numbers of participants...However, some people are more likely to be included...which can create a problem called ascertainment bias
To study this link between genetics data and participation bias, the researchers turned to one of the largest biomedical databases in the world, the UK Biobank which contains information from half a million participants.
Using UK Biobank data, it was found there is a genetic component to people's probability to participate - that is correlated but distinct from other human traits. Published today in Nature Genetics, the study highlights that participation could be an important human trait that has been previously underappreciated and introduces a statistical framework that could lead to more accurate analyses of genetic data.
Professor Augustine Kong, senior author from the Leverhulme Centre for Demographic Science and the Big Data Institute, notes, ‘Ascertainment bias poses a statistical challenge in genetics research, particularly in the era of big data. Adjustments for this bias often rely on known differences between participants and non-participants, introducing imperfections when answering questions involving variables only observed for participants, such as genotypes. Our study identifies detectable footprints of participation bias in the genetic data of participants, which can be exploited statistically to enhance research accuracy for both participants and non-participants alike.’
Our study identifies detectable footprints of participation bias...which can be exploited statistically to enhance research accuracy
Professor Augustine Kong
Genome-wide association studies offer important insights into the role of genetics in human health and diseases. However, such studies can be affected by biases, which arise when genetic databases are not representative of the intended study population. Now, the identified genetic inclination to participate can help scientists assess the representativeness of their study sample.
By analysing the genetic data of over 30,000 related participants with white British descent from the UK Biobank, the researchers found that the genetic component underlying participation in the study is correlated with, but distinct from, the genetic components of traits such as educational attainment and body mass index.
For example, the estimated correlation between the genetic components underlying participation in the UK Biobank and educational attainment is estimated to be 36.6%. This result is consistent with some of the previously reported differences between the participants and the non-participants, but it also shows that the participation bias is not fully captured by these previously known differences. In other words, participation is not simply a consequence of these other traits and characteristics.
The study also found the genetic component of participation can be passed down through families and may affect people's participation in many different studies over their lifetimes. This highlights the potential for bias in genetic research and underscores the importance of accounting for such biases in study design and analysis.
Professor Melinda Mills, Director of the Leverhulme Centre concludes, ‘As our GWAS Diversity Monitor shows, the road to improve diversity in genome-wide association studies is long. However, this statistical framework is a huge step in the right direction to mitigate the risk of incomplete or inaccurate data analysis and ensure that genetic research truly benefits everyone.’
The study, 'Studying the genetics of participation using footprints left on the ascertained genotypes', is available at Nature Genetics: https://www.nature.com/articles/s41588-023-01439-2