In two-sample MR, two datasets of summary-level statistics describing the single nucleotide polymorphism (SNP)-exposure association and the SNP-outcome association are combined to estimate the causal effect of the exposure on the outcome. To do this, the alleles within the two datasets must be harmonized such that the effect allele (and corresponding beta and effect allele frequency) within the outcome dataset (i.e., the dataset providing estimates of the SNP-outcome association) reflect the same effect allele as in the exposure data (i.e., the dataset providing estimates of the SNP-exposure association). This can be done by inferring the deoxyribonucleic acid (DNA) strand (i.e., forward or reverse) or by utilizing the effect allele frequencies within both datasets. Before harmonization, the exposure data will usually be orientated such that the association between all SNPs are all consistent in direction (e.g., all effect alleles will be associated with an increase in the exposure). In fact, some MR methods (e.g., the MR-Egger method) requires this orientation.
When harmonizing summary-level data, if the effect allele of a particular SNP is the same in the exposure and outcome data, the SNP is considered to be harmonized. If, however, the effect allele in the outcome data is not the same as that in the exposure data, the beta (i.e., association between the SNP and the outcome) and effect allele frequency needs to be harmonized such that it reflects the same effect allele as in the exposure data. This can be achieved by multiplying the beta in the outcome data by (-1) and subtracting the effect allele frequency in the outcome data from 1. However, the process of harmonization can be challenging with palindromic SNPs, especially those with high minor allele frequencies, as it can be difficult to infer the DNA strand and, thus, whether the effect alleles are the same across exposure and outcome datasets. With palindromic SNPs, the allele frequency (if available) can be used to infer the strand and, thus, whether the effect alleles are consistent across datasets. However, if this information is not available, the options for harmonizing such SNPs are to (i) assume all SNPs in both datasets have been presented in the same way (i.e., both on the forward strand) based on knowledge of other SNPs that are non-palindromic SNPs or (ii) drop all palindromic SNPs for which it is not possible to infer the direction. This second option may also be taken if the effect allele frequency of a palindromic SNP is close to 0.5, and thus the strand might not be confidently inferred. To overcome errors in harmonization, it is advised to harmonize using automated scripts that have been thoroughly tested (for example, those available in MR-Base) and check the correlation between effect allele frequencies before and after harmonizing. It is also useful to provide pre- and post-harmonization datasets to allow assessment of the quality of the harmonization and perform sensitivity analyses to evaluate the influence of variants difficult to harmonize (e.g., palindromic SNPs with high minor allele frequency) by, for example, presenting results with and without those variants included.
References
- Hemani G, Zheng J, Elsworth B, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife 2018; 7.
- Lawlor DA. Two-sample Mendelian randomization: opportunities and challenges. International Journal of Epidemiology 2016; 45: 908-915.
- Hartwig FP, Davies NM, Hemani G, Davey Smith G. Two-sample Mendelian randomization: avoiding the downsides of a powerful, widely applicable but potentially fallible technique. Int J Epidemiol 2016; 45: 1717-1726.
Other terms in 'Sources of bias and limitations in MR':
- Assortative mating
- Canalization
- Collider
- Collider bias
- Conditional F-statistic for multiple exposures
- Confounding
- Exclusion restriction assumption
- F-statistic
- Homogeneity Assumption
- Horizontal Pleiotropy
- Independence assumption
- INstrument Strength Independent of Direct Effect (InSIDE) assumption
- Intergenerational (or dynastic) effects
- Monotonicity assumption
- MR for testing critical or sensitive periods
- MR for testing developmental origins
- No effect modification assumption
- NO Measurement Error (NOME) assumption
- Non-linear MR
- Non-overlapping samples (in two-sample MR)
- Overfitting
- Pleiotropy
- Population stratification
- R-squared
- Regression dilution bias (attenuation by errors)
- Relevance assumption
- Reverse causality
- Same underlying population (in two-sample MR)
- Statistical power and efficiency
- Vertical pleiotropy
- Weak instrument bias
- Winner's curse