Discussion of Megahit

Megahit was easy to install and it ran very quickly on large datasets.

We thought it seems like a fine approach for a low-complexity dataset. For my data, though, Megahit assembled 12% of the reads from one of my samples, and only 3% of the coassembly using the default settings. Perhaps a better strategy for a high-complexity dataset would be to normalize k-mers using, for example, diginorm or stacks before running megahit meta-large or even an assembler with more options.

We also discussed other assemblers, and decided that it might be best to pick your assembler based on the dataset in question.


Next Friday, June 24, we’ll discuss this paper:

Li et al: MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. 2015 Bioinformatics.

Megahit is an assembler for metagenomics data. It was developed to work on large, complex datasets.  It’s available from github, and doesn’t do any pre-processing for you.