The UMGC is pleased to announce that we have obtained a license from KeyGene (NL) that will enable us to provide Sequence-Based Genotyping (SBG) to academic and industry scientists throughout North and South America. SBG is gaining popularity as a cost-effective, high-throughput method for SNP discovery and genotyping on a genome-wide scale.
SBG will complement the PCR-based, array-based, and mass-spec based genotyping platforms we currently operate, by permitting rapid turnaround, and very cost effective genotyping of any species, especially those for which there are no well-established reference maps or commercially-available targeted reagents. Our goal is to provide a complete sample-to-genotype SBG workflow, including bioinformatic analysis and high-throughput DNA extraction when needed.
SBG is a “reduced representation” method, of which there are a number of technical variants, including Genotyping By Sequencing (GBS), Restriction Site-Associated Sequencing (RAD-Seq), Double Digest RAD-Seq (ddRAD-Seq), but the general principle in all of these methods is the same: gDNA samples are digested with restriction enzymes to reduce the size of the genome, ligated with adapters, multiplexed, and sequenced using Illumina short-read sequencing. Because the same “reduced representation” of the genome is thereby sequenced from sample to sample, a relatively small amount of sequencing provides sufficient coverage to genotype markers located within the restriction fragments.
For clients who have an existing SBG (or other reduced representation) protocol in hand, that uses a pre-selected restriction enzyme derived from their own work or from the literature, we can adapt the restriction digestion protocol to our own workflow by purchasing appropriate adaptors. There is no cost to the client for this internal optimization effort, assuming that they have contracted at least one 96-well plate’s worth of SBG.
For clients who are interested in optimizing SBG performance, we provide SBG simulation and optimization pilot projects. For species for which there exists a reference genome assembly, we carry out in silico genome-wide restriction digestion, and provide options for robust genotyping of a specified number of loci. (Such a reference genome need not be closed; a gapped assembly is adequate for assay simulation.) In order to provide an accurate assessment of the number of informative markers that will be scored, we then perform a focused pilot experiment using the simulated protocol on a small panel of samples in order to confirm performance and estimate genetic diversity across recovered loci. Following successful wet-lab testing, the protocol is ready for production genotyping. See table below: Optimization Pilot #1.
For species for which no reference genome has been generated, we provide two pilot project options. The best option is to carry out low-cost, relatively low-depth whole-genome sequencing (WGS, 10X-30X), in order to create a gapped assembly for the purpose of in silico SBG simulation and subsequent mapping of SBG reads. Following WGS, the simulation/optimization pilot project proceeds as described above. See table below: Optimization Pilot #2.
Alternatively, we can carry out an empirical wet-lab comparison of a panel of restriction enzymes (8) to pick a suitable enzyme. This approach will be less likely to result in a high-performance assay, but is also less costly than WGS and de novo assembly. See table below: Optimization Pilot #3.
|Optimization Pilot #1
In silico SBG simulation
|Used for: species for which a reference genome exists.
Scope: in silico simulation of SBG and selection of an optimal enzyme will be followed by SBG of a small number (3) of samples to confirm performance and estimate SNP diversity.
|Optimization Pilot #2
WGS followed by in silico SBG simulation
|Used for: species for which a reference genome does not exist.
Scope: Low-depth (10X-30X) whole-genome sequencing and genome assembly are carried out to generate a gapped assembly to be used for in silico simulation of SBG. This step is followed by Optimization Pilot #1. The cost for Optimization Pilot #2 is dependent on the size of the genome of interest.
|Optimization Pilot #3
Empirical enzyme selection
|Used for: species for which a reference genome does not exist.
Scope: a panel of 8 candidate enzymes is used to digest a single sample, with library assessed by capillary electrophoresis and SBG. Enzyme selection is then followed by SBG of two additional biological samples.
If submitting genomic DNA, we request that you provide a mass ≥ 500 ng, concentration ≥ 25 ng/ul and volume ≥ 20 ul. Ideally, clients should quantify and quality-control their gDNA with both fluorimetry (e.g. PicoGreen) and UV spectroscopy (e.g., NanoDrop) prior to shipment to the UMGC.
DNA Buffer: we recommend that clients provide DNA in “no EDTA” buffer, such as QIAGEN’s “EB” elution buffer (10 mM Tris-Cl, pH 8.5, no EDTA), or “low EDTA” TE (e.g., pH 8.0 TE, 0.1 mM EDTA). A magnesium chelator, EDTA can inhibit enzymatic steps involved in both restriction enzyme digestion and amplification steps of SBG.
Consumables and shipping: please submit DNA samples in skirted 96-well plates, closed firmly using a foil or heat-based sealing tape, frozen upright before packaging to ensure that the solution is not in contact with the seal, and shipped on dry ice to prevent thawing in transit. All plates must contain ≤ 94 samples + ≥ two empty wells, the positions of which must be unique to each plate. Stacked plates should be separated from each other by cardboard dividers to prevent the piercing of sealers by the bottoms of other wells, and taped firmly together to ensure that dividers remain in place. Plates are inspected upon arrival for intactness and evidence of well-to-well cross-contamination. Plates with evidence of damage or leakage or cross-contamination may be flagged for re-shipment.
DNA quantification: we prefer – and strongly advise – that clients use PicoGreen fluorimetry to quantify their gDNA, because UV spectroscopy (e.g., NanoDrop) often overestimates the concentration of DNA due to the presence of RNA, protein, and other UV-absorbing buffer constituents (e.g., EDTA). If PicoGreen methods are not available at your institution, note that samples will be re-quantified using PicoGreen once they arrive in the UMGC. Samples found to have a DNA concentration < 5 ng/ul, or volume < 20 ul may be flagged as of insufficient mass and/or concentration for processing.
DNA QC: A260/A280 and A260/A230 ratios are useful measurements of inhibitors that may interfere with SBG processing. We encourage users to ensure that their samples have an A260/A280 of approximately 1.8 (no lower than 1.5), and an A260/A230 of no lower than 1.0.
DNA Extraction: The UMGC has a high-throughput DNA extraction protocol that has been shown to provide gDNA of suitable quality for SBG. The cost for DNA extraction is built into the SBG cost in our “Tissue-to-Data” service, and amounts to $6.70 (UMN pricing) or $9.11 (External pricing) per sample. Note that for a species or tissue type that we have not yet encountered, it is possible that a pilot extraction project will be required to ensure that the resulting DNA performs well in SBG assays.
The UMGC’s Sequence-based Genotyping (SBG) Service price includes two components: 1) SBG Library Preparation (table below) and 2) Next-generation Sequencing (next section). Pilot projects may also be required for in some cases.
Library preparation cost depends on the source material (DNA or tissue) and the project scale, with lower costs achieved for larger numbers of samples.
|Scale1||From DNA2||From Tissue3||From DNA2||From Tissue3|
|565 – 1,128||$10.97||$17.66||$15.22||$24.33|
|1,128 – 4,512||$10.08||$16.77||$14.05||$23.16|
1. SCALE = number of samples submitted in a single batch. For > 4,512 samples per batch, or annual volumes > 10,000 samples per year, rates as low as $12 (DNA) or $20 (tissue) per sample are possible.
2. DNA-TO-DATA SERVICE: 1) DNA quant/norm, 2) SBG library prep, 3) library QC and pooling, 4) Next-generation Sequencing, 5) informatics for primary QC, variant calling, and genotyping report generation.
3. TISSUE-TO-DATA SERVICE also includes DNA extraction. Extracted DNA will be returned to client upon request.
There are several different NGS options that are suitable for SBG, with the primary differences between them being:
The implications of these choices are as described here:
High output sequencing generates ≥ 220M raw (≥ 180 M quality filtered) SBG reads per lane at a lower cost per read, but may be associated with a longer turnaround time for smaller projects due to the need to fill an 8-lane flow cell with samples.
Rapid run sequencing generates ≥ 120M raw (100 M quality filtered) SBG reads per lane at a higher cost per read, and is typically associated with a shorter turnaround time for smaller projects, thanks to its smaller-capacity 2-lane flow cell.
Shorter single-reads (1x100 SR) are best suited to circumstances in which there is a relatively higher degree of sequence diversity, such that the probability of a shorter fragment containing a polymorphic marker is high. With short SRs, the cost per read is lower, but the cost per MB is higher.
Longer paired-end reads (2x100 PE, 2x125 PE, 2x150 PE) are best suited to circumstances in which there is a relatively lower degree of sequence diversity, such that the probability of a shorter fragment containing a polymorphic marker is low, requiring longer fragments to capture a polymorphic tag. With long PE reads, the cost per read is higher, but the cost per MB is lower.
The relationship between number of samples per lane and per-sample cost or per-sample read depth (coverage) is inversely proportional: as plexity doubles, cost per sample and read depth (coverage) are halved. Plexity may also have an influence on project turnaround time, as higher plexities for smaller projects may be associated with partial flow cells (e.g., requiring just 2-3 lanes of an 8-lane flow cell). As it is uneconomical to run partially-full flow cells, higher plexities may result in a delay as we await the completion of other projects to fill unoccupied lanes.
NGS cost per sample (table, column 9) depends on three NGS options: Run Type (column 1), Read Type (column 2), and Plexity (column 6), and can range widely depending on your scientific goals. Please consult with UMGC staff to understand the pros and cons of different NGS choices by contacting us at: firstname.lastname@example.org.
For pricing information, please download our SBG Sequencing pricing tables.
An SBG project requires a multi-step bioinformatics analysis in order to generate high-quality genotype calls. We convert the raw fastq files generated from your sequencing run into high-quality genotype calls using our custom-built bioinformatics pipelines, enabling you to get a jump-start on analyzing your data. We are able to generate genotype calls with or without a reference genome and in diploid and polyploid species. Our pipelines have been tuned using reference datasets to ensure high-quality genotype calls.
We provide an analysis report with quality control metrics with plots and genotype calls in VCF format.