We had a good (half) conversation about workflows today.

The first thing we started talking about is getting the sample that you want.  It seems like it’s more difficult than it should be to isolate the gunk/biomass of interest, especially if you’re working in a host system.  There’s probably always going to be host contamination, but it’s a waste to sequence all of that.  So, we decided that the best approach is to prepare your sample to get the best yield rather than trying to sort out the sequences you want later.

Library preparation maybe should be done in a separate room/bio safety cabinet.  And clean your pipettes. Perhaps a good model to follow is procedures used by those who work with ancient human DNA.

Also, make sure to sequence your kit! And use negative controls.  If you get a result from a negative control, should you eliminate taxa?  One approach we discussed was to use multiple water blanks and take a median of the blanks, then compare samples to blanks.  If the blanks have a higher median abundance than a sample, throw out the sample.

Technical replicates are a good idea, but how do you deal with those?  Try comparing the coefficient of variance between biological and technical replicates and samples.

If you can, fit your whole experiment on one run.  We swapped horror stories of different runs separating when principle component analysis was done 😦

Once you have your sequences, prepare them for downstream analysis by trimming adapters, filtering to a quality you’re comfortable with, and, possibly, merging paired ends.  If you have 16S data, merge first, then qc. With metagenomics, you can merge the high quality reads (after QC).  PEAR and Flash are two read-joiners we’ve used and liked.

As far as quality control goes, Matt here at the genome center has a set of tools you could use if you have a known insert size.  Guilluame uses custom script to trim adapters and remove low quality reads.  Trimmomatic and the FASTX Toolkit does this too.  They’re probably all going to do the same thing, and the differences will be in the run time.

So, now that you have reads you’re comfortable with, the first thing most everyone wants to do appears to be taxonomic assignment.  Make sure if you’re merging ends to use merged reads plus forward reads OR a tool that takes pairing information into account so you’re not double-counting the same read.  Some tools we talked about: Kraken/Braken, Metapalette, Discribinate, MEGAN.

That’s as far as our conversation got in an hour.  We’ll definitely pick up from here in the future!