Genetic Instability, Gene-Gene Interactions, and SNP Sampling Strategies to Identify Disease Alleles
The analysis of genetic variation is expected to further our understanding of the genetic contributions to human diseases and may provide insights into human evolution. Modern genetic analysis of pedigrees segregating highly penetrant cancer susceptibility genes has provided a wealth of information on both cancer pathogenesis and cancer risk. Unfortunately, for most commonly occurring tumors, the high-risk alleles of such loci do not play a significant role. In our project Mapping Interactive Cancer Susceptibility Loci (MICSL), we have developed an affected sibling pair cohort (breast, prostate, colon, and lung cancer) to pursue genes with relatively modest relative risks, especiallly those that may interact with other such genes. We test the hypothesis that this class of genetic variation and genetic interaction most often determines cancer risk in the population at large.
As of December 2003, we had recruited a total of 871 affected sibling pairs. These consisted of 594 pairs with breast cancer, 288 with prostate cancer, 77 with colon cancer, and 33 with lung cancer. This cohort of affected sibling pairs is one of the largest such resourced for linkage analysis in commonly occurring cancers. the major focus of MICSL has been to conduct linkage analysis with 100 candidate genes involved in tumor suppression, DNA repair, cell-cylce control, apoptosis, and drug and xenobiotic metabolism. So far 97 candidate genes and 238 microsatellite markers have been established in our production stream for genotyping all sibling pairs. We discovered and implemented genotyping for almost one-third of these markers from Human Genome resources; the others represent previously indentified markers in public databases. At present, we have constructed a Web-enabled database that contains more than 233,000 genotypes for the preliminary linkage analysis of our candidate genes.
We have preliminary results suggesting that the candidate gene approach wer are employing will be quite productive in identifying genes influencing cancer susceptibility. Two-point linkage analysis (candidate gene and marker) have identified numerous potential linkage signals (specifically in our breast and prostate cancer cohorts) that have guided our subsequent multi-point linkage efforts. Our ability to identify prominent linkage signals in our candidate genes has been facilitated by our use of clinical data simultaneously collected with our patient DNA samples. This allows us to stratify our affected sibling pairs based upon additional disease criteria such as family history or pathological features. Utilizing this approach, we have identified a locus on chromosome 3 implicated in prostate cancer risk. We are continuing to characterize this locus via linkage disequilibrium mapping approaches using single nucleotide polymorphisms (SNPs) with the goal of narrowing the disease interval. Our approach demonstrates how our hypothesis-driven candidate gene approach has proven fruitful as opposed to genome-wide trawling exercises.
Our sibling pair resources are also facilitating the development of a new methodology for detecting disease gene associations and gene-gene interactions that use fare lower numbers of sibling pairs for disease allele discover. This approach relies on enrichment strategies for patients based on allele-sharing. We believe it will be possilbe to partition our sibling pair resources into groups that will allow for the initial discovery of disease variants and their confirmation in a replicate sample set. Ultimately, this allows for the testing of virtually any candidate gene provided information is known about the biology / function of the gene. The previously described custom Web-enabled database facilitates these analyses.
In recent years, the identification and analysis of SNPs have drawn remendous enthusiasm from geneticists; however, the best SNP-based sampling strategy for human disease studies is still very controversial. The controversy arises from the debates examining both SNP density and sampling strategies to facilitate the best study design. We have made substantial progress in the analysis of SNP haplotypes to address the above issues. Our data suggest that the required SNP density is largely determined by the evolutionary history of current major haplotypes, and a large portion of the genome may require a very high density (<1 kb) of radomly sampled SNPs. Using this concept, we have developed a SNP sampling strategy by selecting SNPs with highly correlated allele states. This has allowed us to demonstrate the existence of evolutionarily conserved haplotype frameworks across ethnic populations that could define probable historical recombination hotsposts. These observations have implications in the search for the casual diseases such as cancer based on the CDCV (Dommon Disease/Common Variant) hypothesis.