Extension of two-sample MR, whereby an additional dataset is used to screen and select instrumental variables (IVs) associated with the exposure before MR analyses.
Traditionally, MR studies will select IVs from a genome-wide association study (GWAS) of the exposure in a single study or multiple studies as those genetic variants that reach a genome-wide significance p-value threshold (traditionally a p-value <5e-08). In a one-sample MR setting, those IVs would then be extracted from individual-level genetic data and used to calculate the causal effect estimate of the exposure on the outcome. In a traditional two-sample MR setting, summary-level data relating to those IVs found in the GWAS of the exposure would then be extracted from that same GWAS and a second GWAS of the outcome to estimate the causal effect of the exposure on the outcome. However, using the same dataset to select IVs and perform MR analyses can lead to bias through winner's curse and model overfitting. Additionally, in the presence of weak IVs, the causal effect estimate can be biased towards either the confounded observational association (in a one-sample MR setting) or the null (in a two-sample MR setting). To reduce these biases, one-sample MR studies therefore usually use a GWAS of the exposure to select IVs that is independent from the individual-level data being used to calculate the causal effect estimate. Similarly, in a two-sample MR setting, studies can choose to use a second and independent GWAS of the exposure in a single study or multiple studies to restrict the set of IVs to those that meet the same (or prespecified) p-value threshold and use this GWAS as the source of beta-coefficients of the IV-exposure association such that there is more confidence in the IVs being use in the MR analysis and their effect sizes. The use of two GWASs to source information about the IVs of the exposure and one GWAS of the outcome led to the name "three-sample MR".