Some BioBakery

The biobakery tools are many and varied; we  focusing on taxonomic (MetaPhlAn) and functional (HUMAnN) annotation of metagenomes.

PiCrust uses 16S marker gene data to predict metagenomes and thereby functional profiles. It discards unidentified OTUs from, for example, QIIME, so the longer the 16S sequences you use to initially generate taxonomic IDs, the better. Documentation and readability of the output could be improved.

MetaPhlAn matches reference genomes and sequences to classify based on similarity and calculates abundances. MetaPhlAn does have the capability for generating a custom database against which to run reference genomes. However, we found Kraken to be a better use of time as it does the same thing and runs faster.

HUMAnN generates a functional abundance table and assesses the completeness pathways. HUMAnN pulls the organisms that MetaPhlAn identifies and runs them. It can run without MetaPhlAn data if one runs nonstratified input. Abundances are normalized by gene length and depth of sequences.

MetaPalette and Bracken

This week, we mostly discussed MetaPalette. Though it can be tricky to install correctly, and I for one have had problems with memory limits, I like it because it pulls genomes from NCBI databases including Bacteria, Archaea, Viruses and Eukaryotes, which is rare in an annotation software workflow.  I also like the fact that it relies on kmers of two sizes (30 and 50) and assigns based on the lowest common ancestor.  A suggestion we posited was to run the program in a virtual server environment.

We also briefly discussed Bracken, which takes adjusts results from Kraken using genome size and Bayesian statistics. The product is a table that includes the original associated numbers, the adjusted reads, and the final percentages.  Matt described an experiment investigating viral reads horse cells.  A custom database worked well here, classifying about 60% of the reads of interest.

Kraken: taxonomic sequence classification system

We will discuss this paper on Friday, April 22nd from noon-1pm:

Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15:R46.

Obtain software from here.

Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.

Have any installation/running questions? Ask here.