Off to Ecuador: Portable DNA Sequencing for Rapid Species Identification in the Field
In the previous post I documented an experiment with the miniPCR amplifying barcodes to ultimately run on a portable gene sequencer (the MinION) developed by Oxford Nanopore Technologies. Here's the protocol on how I prepared the DNA for the MinION in a nutshell and how my colleague, Stefan Prost, and I have analyzed some of the data thus far.
First off, I pooled the 12 barcode PCR products from the miniPCR and began the library preparation using the 1D PCR barcoding amplicons (SQK-LSK108) Protocol. This has three primary steps, mainly End-prep, Adapter ligation, and AMpure XP bead binding. In the End-prep, you mix ~1 µg DNA with Ultra II End-prep reaction buffer & enzyme mix and heat for 5 minutes at 20 °C and 5 minutes at 65 °C, and clean with AMPure beads. Next, you mix the end-prepped DNA with Adapter mix, Blunt/TA Ligation master mix, wash with beads, add Adapter Bead Binding buffer, and elute. This all takes around a couple hours (although taking your time with the bead cleanups seems to help with DNA recovery) and now you’re ready to load the library-prepped DNA into the MinION flow cell!
I started the sequencing run using the MinKNOW software on 7/3/17. First off, the software determines how many active pores the flow cell contains, which looked pretty good: 497 active pores in group 1, 407 in group 2, 208 group 3, and 39 group 4. Then the run kicked off and the reads started flowing. About one hour in, roughly 15,000 reads had been produced, but it looked like my laptop was getting a bit sluggish (perhaps because I'm using an external SSD drive on my Vaio Sony laptop) and the pore count was dropping off. So I hit the stop acquire button, printed the MinKNOW report and the laptop seemed to catch up on the intensive computing required for the run. I saved the flow cell in the fridge and went on to check out the data from the reads.
Report given by MinKNOW. No surprise that most of the reads are short length (all the amplicons were ~600 bp to ~1.2 kb. The longer reads are likely the control DNA that ONT provides to run concurrently with your sample.
The barcode reads were basecalled and then demultiplexed with a program called Albacore, which split up barcodes 1 – 12 into different folders. I grabbed a few raw sequences from barcode 1 (the ALS 16S gene), threw it into a BLAST search, and to my pleasant surprise got a snake 16S BLAST hit! Other barcodes appeared to get the correct match as well, which was really encouraging.
The next step was to create a consensus for the barcode reads. To do so, we first tested reference based mapping. We used two references, the same PCR amplicon sequenced with Sanger and a reference for a different species from the same genus downloaded from NCBI. We then mapped the reads with BWA mem, an algorithm that can handle divergent reads and sorted and processed the reads using Samtools. We then called the consensus using ANGSD or Geneious, and mapped the reads back to it for post-mapping polishing of the consenus sequence. We performed the polishing using Nanopolish. We then assessed the consensus sequence quality using the Sanger sequence. We see a low error rate for base calls after polishing. The only difference between mapping against the Sanger read and the downloaded reference, was that we missed three few basepair long indels, which weren't present in the downloaded reference (from a different species). We are currently exploring de-novo approaches to create consensus sequences without the use of a reference sequence, such as a program called Canu and the LAST aligment tool.
So overall, after a test trial using the MiniPCR and MinION, we're ready to hit the jungles of Ecuador for real-time portable DNA sequencing! The trial looked promising for basecalling amplicons used for species identification, now to see if we can do it all in the field. Heading to the airport now with Stefan and will post updates soon!
-Aaron Pomerantz & Stefan Prost