My lab in the Biostatistics Department at the Harvard School of Public Health focuses on understanding the function of microbial communities, particularly that of the human microbiome in health and disease. This entails a combination of computational methods development for wrangling large data collections, as well as biological analyses and laboratory experiments to link the microbiome in human populations to specific microbiological mechanisms. In particular, we've worked extensively with the NIH Human Microbiome Project to help develop the first comprehensive map of the healthy Western adult microbiome, and there's plenty of work left to keep us busy understanding how human-associated microbial communities can be used as a means of diagnosis or therapeutic intervention on the continuum between health and disease.
Specific research areas we're working on include:
Computational models for functional genomics in microbial communities. These typically involve bioinformatic algorithm development to relate the microbes present in an environment to their biomolecular activities using metagenomic, metatranscriptomic, proteomic, or metabolomic data (or, when possible, all of the above). Model bacteria like E. coli have been very closely studied in isolation, but there are upwards of two pounds of bacteria in your gut right now, and we don't know what they're doing there. Or rather, we know that they're helping to digest your food, to keep your immune system in good shape, and probably to fight off metabolic disorders like diabetes and obesity - but we don't yet know how. Much of the same can be said of pathogens; every bacterial or viral infection creates its own "community" comprising a population of cells evolving within your body. If you're interested in the nuts and bolts, they include:
- How can we use large-scale data integration to predict protein function and biological networks in uncharacterized microbial populations? Sequence homology doesn't cut it in most microorganisms; horizontal gene transfer and sub/neo-functionalization wipe out a lot of the expected signal. And between 1/3 and 2/3 of the microbial genes we encounter in the wild are typically uncharacterized.
- How can we integrate data across species in a microbial community in order to predict the overall behavior of the system? This requires a different perspective on biological function, since there's a difference between what one bug in your gut is doing to survive and what they're all doing together to help eat your food.
- Where does "you" stop and your microbial community begin? There are more microbial cells in you than human cells, and they're just as beneficial to your health as your kidneys or lungs (unless they're pathogens, in which case they're not as nice). I'm interested in using large scale data mining to assemble a joint picture of microbe-microbe and host-microbe interactions as a single biological system.
- Can we develop experimental models in the lab that accurately reflect the behavior of microbial communities in the wild? It's very difficult to build something tractable in a dish that "acts like" a human-associated microbial community in the absence of a true host or animal model.
It's important to translate the human microbiome into actionable diagnostics and therapeutics as well. Bioinformatic models of microbiology and molecular biology typically inform our understanding of mechanism - exactly which bugs or molecules are carrying out specific biological processes. But we've still got a wide gap to bridge between these details and understanding the microbiome in large human populations, let alone whether it can be used to predict or mitigate disease risk. Areas of translational or epidemiological interest for the lab include:
- Gut microbiome structure and function in autoimmune disease, particularly the inflammatory bowel diseases Crohn's and colitis and in other immune disorders such as type 1 diabetes and rheumatoid arthritis. Even though these conditions have highly disparate symptoms, they share both genetic and immunological links, and are increasingly understood to be microbiologically related as well.
- Computational methods for microbiome biomarker discovery, both cross-sectionally among populations and longitudinally over time. The first step in relating the healthy microbiome to disease is understanding exactly when a microbe or microbial gene is statistically, reproducibly different, and the second is determining whether that difference is a cause or an effect.
- Assessing potential interventions to the microbiome for maintaining or improving host health. If we can understand when or how the microbiome has "gone wrong" in disease, we can also begin to correct those malfunctions. Microbial (probiotic), dietary, and pharmaceutical (small molecule) stimuli have all been well-studied for their effects on the microbiome in animal models, and we hope they'll become increasingly viable for human health as well.
Finally, computational models for functional genomics in higher organisms, particularly for human health and populations are often not that different from those for microbial communities. Multicellular organisms (and human beings in particular) represent a unique opportunity for bioinformatic analysis. Paradoxically, although we tend to know more details about simple model organisms, the vast majority of the experimental data out there deals with metazoans. However, the same complexities that make us human - hundreds of distinct tissue types, combinatorial intra- and intercellular signaling, and development from a single cell to over ten trillion - make us look a lot more like a microbial community than a unicellular model organism in many cases. A selection of the challenges involved includes:
- Practical. Individual bacteria and most unicellular organisms have in the neighborhood of 5,000 genes, and as you scale up genome size to humans' ~25,000, data management and analysis become difficult.
- Experimental. You can't grow most organisms (or communities) in petri dishes, so there's just plain not as much detailed information available, and computational predictions are harder to validate in a meaningful way.
- Evolutionary. After genomes have been duplicated and rearranged across millions of years, how can you tell when sequence conservation equates functional conservation? What does it mean when one gene in a model organism is homologous to a whole set of human genes (especially if they're expressed in different tissue types or at different times)?
- Theoretical. How do you best search multiple aligned interaction networks for dense subgraphs? In other words, if you know (or at least guess) what every gene is doing in several related organisms, how do you pick out the interesting bits?
My work includes initial efforts to broaden our functional analysis techniques for higher organisms, with the particular goal of using computation to advance the state of genomic medicine and our mechanistic understanding of human disease. This includes not only making computational predictions in multicellular organisms, but also comparing and analyzing their commonalities and differences and using better studied model systems to enhance our knowledge of less well understood ones.
Like everyone else in academia, I work on plenty of other projects as well. I also love teaching, hosting an introductory bioinformatics course here at Harvard and, previously, working with the novel Integrated Science program at Princeton. Please feel free to contact the lab for more details about our work, our teaching and training activities, or for inquiries from prospective postdoctoral or visiting researchers.