HUMAnN: The HMP Unified Metabolic Analysis Network


Looking for prerelease HUMAnN software? You can obtain the latest snapshot here:

humann-0.98.tar.gz

Note that this represents the finalized version of the software used in the Human Microbiome Project and in our manuscript, which is currently in press at PLoS Computational Biology. Please contact me if you'd like a preprint or have any comments, suggestions, or bug reports for the software. While review is wrapping up, please feel free to obtain the latest beta version from our Mercurial source code repository at http://bitbucket.org/chuttenh/humann using the hg clone command.

HUMAnN is a pipeline for efficiently and accurately determining the presence/absence and abundance of microbial pathways in a community from metagenomic data. Sequencing a metagenome typically produces millions of short DNA/RNA reads. HUMAnN takes these reads as inputs and produces gene and pathway summaries as outputs:

  • The abundance of each orthologous gene family in the community. Orthologous families are groups of genes that perform roughly the same biological roles. HUMAnN uses the KEGG Orthology (KO) by default, but any catalog of orthologs can be employed with minor changes (COG, NOG, etc.)
  • The presence/absence of each pathway in the community. HUMAnN refers to pathway presence/absence as "coverage," and defines a pathway as a set of two or more genes. HUMAnN uses KEGG pathways and modules by default, but again can easily be modified to use GO terms or other gene sets.
  • The abundance of each pathway in the community, i.e. how many "copies" of that pathway are present.

HUMAnN can thus be used in tandem with any translated BLAST program, with out-of-the-box support for NCBI BLAST, USEARCH, MBLASTX, and MAPX. The pipeline converts sequence reads into coverage and abundance tables summarizing the gene families and pathways in one or more microbial communities. This lets you analyze a collection of metagenomes as a matrix of gene/pathway abundances, just like you might analyze a collection of microarrays.

If you use this software, please cite our paper:
"Metabolic reconstruction for metagenomic data and its application to the human microbiome"
Sahar Abubucker, Nicola Segata, Johannes Goll, Alyxandria M. Schubert, Jacques Izard, Brandi L. Cantarel, Beltran Rodriguez-Mueller, Jeremy Zucker, Mathangi Thiagarajan, Bernard Henrissat, Owen White, Scott T. Kelley, Barbara Methé, Patrick D. Schloss, Dirk Gevers, Makedonka Mitreva, Curtis Huttenhower. PLoS Computational Biology, in press

We are aware that KEGG is now commercial, and we have updated HUMAnN accordingly. In brief, we include derived files and information needed for normal HUMAnN operation, but creation and evaluation of synthetic metagenomes is impeded without a KEGG license. Please contact the KEGG developers if this is an inconvenience for you and contact us for assistance in evaluating HUMAnN output if necessary.

Many thanks to the NIH and to the entire Human Microbiome Project team for making the HMP possible and for the many collaborators who helped to make HUMAnN a reality. Sahar Abubucker and Makedonka Mitreva (Washington University) co-led the Metabolic Reconstruction group, Nicola Segata (Harvard School of Public Health) performed many HMP-specific analyses, the pipeline incorporates software from Yuzhen Ye (Indiana University), Beltran Rodriguez-Mueller (SDSU), and Pat Schloss (University of Michigan), and specific contributors include Alyx Schubert (University of Michigan), Jeremy Zucker (Broad Institute), Brandi Cantarel (UMD), Qiandong Zeng (Broad Institute), Johannes Goll (JCVI), and many others.

An overview of HUMAnN

HUMAnN overview

Metabolic modules differentially abundant in one or more body sites of the human microbiome

Metabolic modules differentially abundant in the human microbiome

Latest Versions

v0.98, 12-06-11


* Allow module2modulec.py to remove unusual duplicate enzymes from KEGG's files
* Allow input filenames to contain underscores
* Fix module size calculation in filter.py
* Fix a bug in hits2enzymes.py to allow a wider range of KEGG gene name detection

v0.971, 10-17-11


* Fix missing exclude.py (thanks to Brandi Cantarel!)

v0.97, 10-17-11


* Add several internal evaluation pipelines in response to initial reviews
* Fix hits2*.py handling of zero/very small e-values (thanks to Fah Sathira!)

v0.96, 07-28-11


* MAJOR CHANGE: KEGG is now defunct, and HUMAnN has been updated accordingly
** KEGG derived information needed for normal operation is included
** KEGG files needed for synthetic metagenome construction are _not_ included
** "Frozen" synthetic metagenome evaluation is still possible
** Please contact us directly for more information if needed
* Add documentation on potential maq issues (thanks to Shinichi Sunagawa!)
* Fix a typo in fastq2fasta.py formatting (thanks to Shinichi Sunagawa!)
* Fix a typo in module2modulec.py formatting (thanks to Kathryn Iverson!)
* Fix a typo in eco.py for overly sparse input files (thanks to Jeffrey Werner!)
* Work around Mac OS X zcat issues (thanks to Jeffrey Werner!)

v0.95, 05-18-11


* Fix a typo in hits2enzymes.py (only affected unused filter option)
* Add complete parameter evaluation process to HMP pipeline