Review of the quality control checks performed by current genome-wide and targeted-genome association studies on Myalgic Encephalomyelitis/Chronic Fatigue Syndrome, by Nuno Sepulveda, Anna D Grabowska, Eliana M Lacerda, Luis C Nacul in Frontiers in Pediatrics. May 7, 2020 [doi: 10.3389/fped.2020.00293 ]
Research article Introduction:
Myalgic encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is a debilitating disease characterized by persistent fatigue and post-exertion malaise, accompanied by other symptoms. The direct cause of the disease remains elusive, but it may include genetic factors alongside environmental triggers, such as strong microbial infections and other stressors.
Aiming to identify putative genetic factors that could explain the pathophysiological mechanisms of ME/CFS, four genome-wide association studies (GWAS) and two targeted-genome association studies (TGAS) were conducted in the past decade (5–10). In the four GWAS, thousands of genetic markers located across the whole genome were evaluated for their statistical association with ME/CFS (5–8). The two TGAS had the same statistical objective of the four GWAS, but alternatively investigated the association of the disease with numerous genetic markers located in candidate genes related to inflammation and immunity (9) and in genes encoding diverse adrenergic receptors (10).
The findings from all these different studies suggested conflicting evidence of genetic association with ME/CFS: from absence of association (7), through mild association (10) up to moderate associations of a relatively small number of genetic markers (5,6,9). The most optimistic GWAS suggested more than 5,500 candidate gene-disease associations (8).
This inconsistency in the reported findings prompted us to review the respective data. With this purpose, the present opinion paper first revisits the recommended quality control (QC) checks for GWAS and TGAS, and then summarizes which ones were performed by those studies on ME/CFS…
Discussion
This opinion paper shows partial QC checks in the majority of the published genetic association studies on ME/CFS, the exception being the study carried out by Herrera et al (7). The assessment of the performed QC checks is essential to ascertain the quality of the respective genetic data. In this regard, the genetic data from Perez et al (8) deserves to be further analysed to ascertain the validity of the reported findings. Such assessment can follow the QC steps outlined here and exemplary performed by Herrera et al (7). The remaining studies can also benefit by an additional quality check related to heterozygosity rate so that possible sample contaminations can be ruled out.
The absence of this check does not immediately invalidate the genetic data of these studies. We could have done such check if the corresponding genetic data were available either in an open-access repository or as a supplementary file within the respective publication, a data-sharing practice followed by several ME/CFS researchers (13–15).
Consequently, it is unclear whether aberrant heterozygosity rates (due to sample contamination) are one of the explanations for the conflicting evidence of genetic associations reported by these studies. In this regard, Herrera et al (7) excluded five out of their 109 samples (5%) based on the heterozygosity rate. In simple statistical applications using large sample sizes, a 5% sample contamination might be too low to have a substantial impact on the respective findings. However, in the specific context of GWAS and TGAS where stringent significance levels are used to control for multiple testing, such a level of sample contamination could reduce the underlying statistical power and leave relevant disease-gene associations undetected.
Besides the partial QC checks, the investigated genetic data on ME/CFS suffer from the curse of not having an objective biomarker for disease diagnosis. Similar problem can be envisioned for other complex diseases lacking a biomarker, such as Fibromyalgia and the Gulf War Syndrome. The absence of a biomarker is likely to introduce a possible misclassification of the true disease status of the recruited patients (16). To illustrate this putative problem, Herrera et al (7) recruited nine obese (with body mass indexes equal or higher than 35 kg/m2) out of 61 patients based on the Centre for Diseases Control Criteria (1) and Canadian Consensus Criteria (2). Notwithstanding controlling for the body mass index in the respective association analysis and the exclusion of known diseases, it is unclear whether the obesity observed in these patients was a direct consequence of ME/CFS or instead caused by another ongoing disease strongly associated with fatigue.
A solution to this problem is to use more advanced statistical methodology where misclassification can be directly included in the data analysis (17,18). However, given the complexity of this methodology, we argue that a stronger collaboration between the ME/CFS research community and statistical geneticists should be reached. In principle, this collaboration is expected to promote better statistical analyses, to improve data interpretations and, ultimately, a better assessment of the genetic component in ME/CFS.
In summary, given the partial QC checks performed in current GWAS and TGAS, the question of a genetic component in ME/CFS remains open for investigation. To accelerate the discovery of promising disease-gene association, future genetic studies of ME/CFS should set data and methodological standards as high as those followed by the 1000 Human Genome Project and the UK10K project (19,20). Data sharing should also be a general practice to provide the researcher community the opportunity to perform additional checks or alternative analyses of the same data.