Indian Journal of Human Genetics
Home Current Issue Archives Guidelines Subscriptions e-Alerts Login 
Users online: 36
Print this page  Email this page Small font sizeDefault font sizeIncrease font size


 
            Table of Contents  
RESEARCH ARTICLE
Year : 2011  |  Volume : 17  |  Issue : 4  |  Page : 27-31
 

Power comparison between population-based case-control studies and family-based transmission-disequilibrium tests: An empirical study


Human Genetics Unit, Indian Statistical Institute, Kolkata, West Bengal, India

Date of Web Publication3-May-2011

Correspondence Address:
Saurabh Ghosh
Human Genetics Unit, Indian Statistical Institute, Kolkata 203, B.T. Road, Kolkata - 700 108
India
Login to access the Email id

Source of Support: Fogarty International Center, National Institutes of Health, USA, grant R01 TW 006604-05 to Saurabh Ghosh and by the fellowship 09/093(0111)/2008-EMR-I to Tanushree Haldar from the Council of Scientifi c and Industrial Research (CSIR), Conflict of Interest: None


DOI: 10.4103/0971-6866.80355

Get Permissions

 

   Abstract 

Background: There are two major classes of genetic association analyses: population based and family based. Population-based case-control studies have been the method of choice due to the ease of data collection. However, population stratification is one of the major limitations of case-control studies, while family-based studies are protected against stratification. In this study, we carry out extensive simulations under different disease models (both Mendelian as well as complex) to evaluate the relative powers of the two approaches in detecting association.
Materials and Methods: The power comparisons are based on a case-control design comprising 200 cases and 200 controls versus a Transmission Disequilibrium Test (TDT) or Pedigree Disequilibrium Test (PDT) design with 200 informative trios. We perform the allele-level test for case-control studies, which is based on the difference of allele frequencies at a single nucleotide polymorphism (SNP) between unrelated cases and controls. The TDT and the PDT are based on preferential allelic transmissions at a SNP from heterozygous parents to the affected offspring. We considered five disease modes of inheritance: (i) recessive with complete penetrance (ii) dominant with complete penetrance and (iii), (iv) and (v) complex diseases with varying levels of penetrances and phenocopies.
Results: We find that while the TDT/PDT design with 200 informative trios is in general more powerful than a case-control design with 200 cases and 200 controls (except when the heterozygosity at the marker locus is high), it may be necessary to sample a very large number of trios to obtain the requisite number of informative families.
Conclusion: The current study provides insights into power comparisons between population-based and family-based association studies.


Keywords: Allelic association, informative trios, complex genetic disorder


How to cite this article:
Haldar T, Ghosh S. Power comparison between population-based case-control studies and family-based transmission-disequilibrium tests: An empirical study. Indian J Hum Genet 2011;17, Suppl S1:27-31

How to cite this URL:
Haldar T, Ghosh S. Power comparison between population-based case-control studies and family-based transmission-disequilibrium tests: An empirical study. Indian J Hum Genet [serial online] 2011 [cited 2016 May 13];17, Suppl S1:27-31. Available from: http://www.ijhg.com/text.asp?2011/17/4/27/80355


Association mapping of susceptible genes underlying complex disorders is an active area of current research in genetic epidemiology. Compared with Mendelian disorders, there has been limited success in identifying genes involved in complex disorders as these traits are believed to be controlled by multiple loci, some with minor gene effects, and genetic variation at any one locus does not completely determine the trait. Moreover, epistatic as well as gene-environment interactions often modify the risk of developing the disease. While linkage analyses [1] have been traditionally successful in identifying rare variants with large genetic effect sizes characterizing Mendelian disorders, they have been relatively unsuccessful in detecting common variants with moderate effect sizes characterizing complex disorders. There is evidence that association studies, which measure the extent of linkage disequilibrium (LD) between alleles of two loci, [2] are statistically more powerful than linkage studies in gene mapping of complex traits. [3] This is because LD exists over small distances on the genome, while linkage exists over larger distances. Thus, a positive association finding gives a more precise location of a locus responsible for the trait. The most popular design for genetic association studies is population-based case-control studies due to the ease of data collection and statistical methodology of testing for association. However, such studies suffer from a major inherent limitation: the problem of population stratification. [4] If the sample is a mixture of genetically heterogeneous subpopulations (i.e., there is heterogeneity in allele frequencies at the SNPs across subpopulations), the association finding may be spurious. This problem is of specific relevance for studies on Indian populations due to the increasing evidence of genetic heterogeneity among different ethnic populations in India. [5],[6],[7] While there are some statistical methods [8],[9],[10] to adjust for population stratification, it remains unclear as to the optimal number of genome-wide markers required to evaluate the level of stratification and the extent of possible correction of the relevant statistics. Thus, it has been of interest to explore, for family-based studies, alternatives that attempt to detect patterns of preferential transmission of a specific parental allele to the offspring, the most well known being the Transmission Disequilibrium Test (TDT). [11] The major advantage of this test is that it is protected against population stratification, although it requires a relatively more demanding data compared with case-control studies.

In this study, we carry out extensive simulations to compare the statistical powers of population-based case-control analyses and the family-based TDT and Pedigree Disequilibrium Test (PDT) [12] for a wide spectrum of genetic disease models. The major challenge lies in the fact that a direct and straightforward power comparison is not possible in the strict statistical sense because the study designs are different with respect to data requirements.


   Materials and Methods Top


We have performed the allele-level test for case-control studies, which is based on the difference of allele frequencies at a single nucleotide polymorphism (SNP) between unrelated cases and controls. The test statistic is distributed as Chi-squares with 1 d.f. under the null hypothesis of no allelic association. The TDT and the PDT are based on preferential allelic transmissions at a SNP from heterozygous parents to the affected offspring. Although the PDT has been designed to incorporate large pedigrees, we have restricted our PDT analyses to trios (two parents and an offspring) for meaningful sample size comparisons with the classical TDT. The test statistics of both TDT and PDT follow a Chi-square distribution with 1 d.f. under the null hypothesis of no linkage or no association. However, in order to compare the powers of the two designs, we need to have a consistency of the null hypotheses. We simulated the TDT/PDT design in the presence of linkage and, hence, tested only for the presence of association. The power comparisons are based on a case-control design comprising 200 cases and 200 controls versus a TDT or PDT design with 200 informative trios (i.e., having at least one parent heterozygous at the marker locus). Because, in practice, it is not possible to directly screen informative trios, we have determined the number of trios that need to be screened to obtain 200 informative trios. We have also estimated the number of cases in a case-control design with an equal number of cases and controls required to obtain equivalent power as the TDT/PDT design. We have considered five disease modes of inheritance: (i) recessive with complete penetrance (an individual is affected if and only if he/she has two copies of the disease allele), (ii) dominant with complete penetrance (an individual is affected if and only if he/she has at least one copy of the disease allele) and (iii), (iv) and (v) complex diseases with varying levels of penetrances and phenocopies (none of the risk genotypes completely determines the disease and some individuals manifest the disease in spite of not possessing any risk allele).

For the case-control design, the genotypes are generated conditioned on the disease (case/control) status under the assumption of Hardy-Weinberg genotypic proportions. For the TDT/PDT analyses, the genotypes of the parents are generated (using Hardy-Weinberg proportions) to determine whether the trio could be informative and, if so, the genotype of the offspring is generated conditioned on the parental genotypes. The affection status of the offspring is generated conditioned on the genotype at the disease locus and the family is considered for analyses only if he/she is affected. The powers are determined empirically based on 1000 replicated sets of simulated data.


   Results Top


The results of the power comparisons of the case-control and the TDT/PDT designs are presented in [Table 1],[Table 2],[Table 3],[Table 4],[Table 5], corresponding to the five disease models considered: the first, a recessive model with disease allele frequency 0.3 (prevalence of 9%); the second, a dominant model with disease allele frequency 0.05 (prevalence of 9.75%); the third, a complex disease model with a risk allele frequency 0.1 and penetrances 0.5, 0.25 and 0.05 (prevalence of 9.05%); the fourth, a complex disease model with a risk allele frequency 0.05 and penetrances 0.3, 0.15 and 0.05 (prevalence of 6.01%); the fifth, a complex disease model with a risk allele frequency 0.1 and penetrances 0.25, 0.1 and 0.05 (prevalence of 6.1%). The powers are evaluated for three marker allele frequencies: m = 0.1, 0.3, 0.5; two parameter values of the recombination fraction: θ= 0.05, 0.01 and four levels of LD: D' = 0 (no allelic association), 0.33, 0.67, 1.0 (complete LD). Consistent with intuitive expectations, we find that the TDT/PDT design with 200 informative trios is more powerful than a case-control design with 200 cases and 200 controls, except when the heterozygosity of the marker locus is high (m = 0.5). However, because informative families can be ascertained only after genotyping of parents, a more appropriate comparison would be based on the number of families to be screened in a TDT/PDT design to obtain 200 informative families. It is obvious that the number of families to be screened will increase with a decrease in heterozygosity at the marker locus, and is clearly validated from all the tables, although the sample size requirement decreases faster for lower marker allele frequencies (0.1-0.3 compared with 0.3-0.5). Similarly, the number of families to be screened will decrease with an increase in the value of D'. This follows from the fact that the sample size requirement depends on the frequencies of the haplotypes based on the marker locus and the disease locus. However, it is interesting to note that although the power of the TDT/PDT is higher for smaller values of θ, the number of families to be screened does not depend on θ. Irrespective of the disease model, we find that for equivalent power, the number of families to be screened in a TDT/PDT design far outnumber the number of cases in a case-control design with equal number of cases and controls. The difference in the sample size requirements becomes less pronounced with (i) increase in heterozygosity, (ii) decrease in the value of D' and (iii) decrease in the penetrance values (i.e., with an increasing degree of disease complexity). In order to evaluate the effect of sample size on the relative powers of the two designs, we simulated data on N cases and N controls as well as N informative families under the five models for N = 100, 500. Although the results are not presented for brevity, the qualitative inferences are similar to those described above. However, we observe that with an increase in N, the difference in the powers between the two designs becomes negligible, although the number of families that need to be screened to get a larger number of informative families would also increase.
Table 1: Power comparisons under a recessive model with disease allele frequency 0.3 (prevalence of 9%)

Click here to view
Table 2: Power comparisons under a dominant model with disease allele frequency 0.3 (prevalence of 9.75%)

Click here to view
Table 3: Power comparisons under a complex disease model with a risk allele frequency 0.1 and penetrances 0.5, 0.25 and 0.05 (prevalence of 9.05%)

Click here to view
Table 4: Power comparisons under a complex disease model with a risk allele frequency 0.05 and penetrances 0.3, 0.15 and 0.05 (prevalence of 6.01%)

Click here to view
Table 5: Power comparisons under a complex disease model with a risk allele frequency 0.05 and penetrances 0.25, 0.1 and 0.05 (prevalence of 6.1%)

Click here to view



   Discussion Top


It is clear from the analytical studies that population-based genetic case-control designs suffer from the inherent limitation of population stratification. Because it is infeasible in most scenarios to ascertain whether the samples collected for case-control association analyses are genetically homogeneous, novel positive findings are always susceptible to be false-positives. The family-based tests for association such as the TDT and PDT circumvent the problem of population stratification as positive association findings based on these tests are possible only in the presence of linkage between the marker locus and the disease locus. However, the data requirements (such as informative trios) for family-based association analyses are much more demanding compared with case-control studies. Thus, there are both advantages and limitations of the two study designs. In this light, the current study provides an alternative framework for statistical comparison based on power.

We found that while the TDT or PDT based on a set of informative trios is more powerful in detecting association compared with a case-control design comprising an equal number of cases and controls as the number of informative trios except when the heterozygosity of the marker locus is very high, a more fair statistical comparison of the total number of trios screened in the TDT or PDT analysis with the number of cases (or controls) in a case-control design to obtain equivalent power shows that the case-control design wins the battle of sample sizes very comprehensively. Moreover, it needs to be emphasized that while a case-control design comprising N cases and N controls requires genotyping of 2N individuals, a TDT or PDT design with N trios requires an expected genotyping of ( 2+α) N individuals, where α is the proportion of informative trios. Thus, in view of the fact that the case-control design yields more power than the TDT/PDT where the number of cases (or controls) equals the number of trios, the relative gain in a case-control design is even greater when the genotyping costs are taken into consideration. We would like to highlight that while family-based association analyses are protected against population stratification with respect to false-positives, they may be adversely affected with respect to false-negatives. Thus, it is of interest to compare the powers of the case-control design to the TDT/PDT in the presence of population stratification. This is statistically challenging as population stratification induces an inflated rate of false-positives in the case-control framework and, hence, a direct comparison of powers without adjusting the distributional thresholds for stratification is not statistically valid.

We plan to carry out extensive simulations under population stratification and compare the powers of the two procedures after adjustments of stratification in the case-control analyses based on a principal components approach. [10]


   Acknowledgments Top


This work was supported by the Fogarty International Center, National Institutes of Health, USA, grant R01 TW 006604-05 to Saurabh Ghosh and by the fellowship 09/093(0111)/2008-EMR-I to Tanushree Haldar from the Council of Scientific and Industrial Research (CSIR).

 
   References Top

1.Ott J. Analysis of Human Genetic Linkage. 3 rd ed. Baltimore: Johns Hopkins University Press; 1999.   Back to cited text no. 1
    
2.Weir BS. Genetic Data Analysis II. Sunderland, MA: Sinauer; 1996.  Back to cited text no. 2
    
3.Risch N, Merikangas K. The future of genetic studies of complex human disorders. Science 1996;273:1516-7.   Back to cited text no. 3
[PUBMED]  [FULLTEXT]  
4.Ghosh S. Interpreting a genetic case-control finding: What can be said, what cannot be said and its implications in Indian populations. Ind J Hum Genet 2007;13:1-4.   Back to cited text no. 4
    
5.Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, Chakraborty M, et al. Ethnic India: A genomic view, with special reference to peopling and structure. Genome Res 2003;13:2277-90.   Back to cited text no. 5
[PUBMED]  [FULLTEXT]  
6.Thangaraj K, Sridhar V, Kivisild T, Reddy AG, Chaubey G, Singh VK, et al. Different population histories of the Mundari- and Mon-Khmer-speaking Austro-Asiatic tribes inferred from the mtDNA 9-bp deletion/insertion polymorphism in Indian populations. Hum Genet 2005;116:506-17.   Back to cited text no. 6
    
7.Indian Genome Variation Consortium. Genetic landscape of the people of India: A canvas for disease gene exploration. J Genet 2008;87:3-20.   Back to cited text no. 7
[PUBMED]  [FULLTEXT]  
8.Devlin B, Roeder K, Wasserman L. Genomic control, a new approach to genetic-based association studies. Theor Popul Biol 2001;60:155-66.   Back to cited text no. 8
[PUBMED]  [FULLTEXT]  
9.Pritchard JK, Stephens M, Rosenberg NA, Donnelly P. Association mapping in structured populations. Am J Hum Genet 2000;67:170-81.   Back to cited text no. 9
[PUBMED]  [FULLTEXT]  
10.Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association analysis. Nat Genet 2006;38:904-9.   Back to cited text no. 10
[PUBMED]  [FULLTEXT]  
11.Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993;52:506-16.   Back to cited text no. 11
[PUBMED]  [FULLTEXT]  
12.Martin ER, Monks SA, Warren LL, Kaplan NL. A test for linkage and association in general pedigrees: The pedigree disequilibrium test. Am J Hum Genet 2000;67:146-54.  Back to cited text no. 12
[PUBMED]  [FULLTEXT]  



 
 
    Tables

  [Table 1], [Table 2], [Table 3], [Table 4], [Table 5]


This article has been cited by
1 Dopamine Genes and Sensory Sensitivity as a Temperamental Trait
Wojciech L. Dragan,Wlodzimierz Oniszczenko,Piotr M. Czerski,Monika Dmitrzak-Weglarz
Journal of Individual Differences. 2012; 33(4): 205
[Pubmed] | [DOI]
2 Effect of Population Stratification on False Positive Rates of Population-Based Association Analyses of Quantitative Traits
Tanushree Haldar,Saurabh Ghosh
Annals of Human Genetics. 2012; 76(3): 237
[Pubmed] | [DOI]
3 LPHN3 and attention-deficit/hyperactivity disorder: interaction with maternal stress during pregnancy
Zia Choudhry,Sarojini M. Sengupta,Natalie Grizenko,Marie-Eve Fortier,Geeta A. Thakur,Johanne Bellingham,Ridha Joober
Journal of Child Psychology and Psychiatry. 2012; 53(8): 892
[Pubmed] | [DOI]
4 LPHN3 and attention-deficit/hyperactivity disorder: Interaction with maternal stress during pregnancy
Choudhry, Z. and Sengupta, S.M. and Grizenko, N. and Fortier, M.-E. and Thakur, G.A. and Bellingham, J. and Joober, R.
Journal of Child Psychology and Psychiatry and Allied Disciplines. 2012; 53(8): 892-902
[Pubmed]
5 Effect of Population Stratification on False Positive Rates of Population-Based Association Analyses of Quantitative Traits
Haldar, T. and Ghosh, S.
Annals of Human Genetics. 2012; 76(3): 237-245
[Pubmed]
6 Dopamine genes and sensory sensitivity as a temperamental trait
Dragan, W.L. and Oniszczenko, W. and Czerski, P.M. and Dmitrzak-WÈ©glarz, M.
Journal of Individual Differences. 2012; 33(4): 205-211
[Pubmed]



 

Top
Print this article  Email this article
           

    

 
   Search
 
  
    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
    Article in PDF (177 KB)
    Citation Manager
    Access Statistics
    Reader Comments
    Email Alert *
    Add to My List *
* Registration required (free)  


    Abstract
    Materials and Me...
    Results
    Discussion
    Acknowledgments
    References
    Article Tables

 Article Access Statistics
    Viewed1478    
    Printed98    
    Emailed0    
    PDF Downloaded65    
    Comments [Add]    
    Cited by others 6    

Recommend this journal