Technologies

We specialize in long-reads and long-range technologies

In order to generated high-quality reference genome assemblies (N50 contig>1Mb, N50 scaffold>10Mb and at least QV40), we use a combination of long-read HiFi PacBio sequences for generating contigs, Bionano optical maps and Hi-C for scaffolding, and ultra-long Oxford Nanopore reads for gap-filling. We host several sequencing instruments to generate most of the data in our lab.

PacBio Sequel IIe

The PacBio Sequel IIe instrument can generate 30Gb of high quality Hifi data with an average read length 15-20kb and at least QV20. Because these Hifi reads provide >99.9% accuracy, they do not require polishing. We use this technology to assemble contigs.

Bionano Saphyr

The Bionano Saphyr instrument generates optical genome maps. These maps are being assembled with long molecules ranging from 150kb to multiple megabases. The assembled maps can then be used to scaffold the pacbio assembly.

Arima Hi-C

We use Arima Hi-C to perform a second round of scaffolding. Because this technology allows to detect long-range interactions, it is possible to scaffold entire chromosomes. We have also started to use Hi-C for haplotype phasing of the pacbio contigs.

ONT PromethIon

The Oxford Nanopore PromethIon (ONT) is our latest acquisition. This technology is capable of sequencing ultra-long DNA fragments (>100kb), albeit with lesser quality than Hifi data. There are regions of the genome, Hifi data can't go trough because of sequencing bias or too long repeats, causing gaps in the assembly. We have started to use ultra-long ONT data for filling these remaining gaps.

VGP pipeline v. 2.0

This is the latest version of our VGP genome assembly pipeline (v. 2.0) that uses Pacbio Hifi data, Bionano optical mapping, and Hi-C data. We are actively working on the next version of the pipeline that will incorporate ONT data and use Hi-C data for haplotype phasing without parental information.

Our current tracking and managing systems

A) Screenshot of our system user interface. This software collects all the sequencing run information (summary statistics, file location, sample name) and aggregates them by project. B) Project status information is publicly viewable through our VGL project tracker This information is periodically synced with our database. C) Our internal LIMS is modular (module 1: Raw Data Manager, module 2: Project Manager, module 3: Metadata Manager, module 4: Taxonomy Manager) and is composed of a central database that interacts with various third party systems through API queries and custom scripts.