Computational Aspects of Molecular Biology
The extraordinary increase in nucleotide and protein sequence data presents new challenges and opportunities in terms of visualizing and analyzing these sequences. My laboratory is examining novel ways of analyzing DNA and RNA sequences using customized computer programs and commercially available software. Earlier work focused on Markov analysis of nucleotide sequences to detect previously unidentified motifs involved in the regulation of gene expression, chromatin structure and/or RNA processing. This approach detects statistical outliers in sequences using a Markov estimate of expected frequencies. Part of the interest in this approach is that very few a priori assumptions are involved, meaning that new molecular mechanisms may be discovered via their cis motifs.
Other computational efforts in this laboratory have focused on predicting the accessibility of mRNAs to siRNA, assuming constantly changing local structures and a Boltzman weighting of these possible structures. This work reflects an increasingly common situation where the output from off-the-shelf software is analyzed in novel ways by customized software.
Customized analysis also is a part of recent efforts in this laboratory to determine the relationship between the large-scale distribution of retrotransposons in genomes, monoallelic gene expression and X-chromosome inactivation. Genetic imprinting, chromatin structure and repetitive elements have been suspected to be mechanistically related to each other for several years, but the exact relationship between these features has proven elusive. The published correlations are modest, and the distinction between cause and consequence is unclear. The availability of fairly accurate whole-chromosome assemblies in human and mouse allow a level of computational and statistical analysis that has not previously been possible. Collaborative efforts with Judy Singer-Sam, Ph.D., director of the Division of Biology, are ongoing to systematically examine the relationship between specific repetitive elements and imprinted regions.
In summary, we are generating a suite of algorithms that can be customized to specific needs for collaborative as well as independent research. The questions addressed range from open-ended discovery of novel motifs as stastical outliers to narrowly defined questions regarding the accessibility of mRNA to various probes. We are at a moment in genomics where new and relatively simple computer programs can greatly decrease the distance between curiosity and answers.