RNA-Seq benchmarking study: A comparison of Ultima and Illumina sequencing platforms

The University of Minnesota Genomics Center (UMGC) has added the Ultima Genomics UG 100 platform to our sequencing services. To determine if the UG 100 produces results comparable to Illumina sequencers, UMGC sequenced samples from several RNA-seq projects using both the UG 100 and the Illumina NovaSeq 6000. The primary goal was to evaluate whether transitioning from Illumina to the UG 100 platform can yield equivalent results.

Materials and methods: 

An identical set of 97 samples was sequenced on both sequencing platforms, prepared using the TruSeq Stranded mRNA Kit or the TakaraBio SMARTer Stranded Total RNA-Seq Kit v2—Pico Input Mammalian. 

Table of projects and library prep

Both kits attach Illumina adapters to cDNA fragments. In this study, these adapters—comprising roughly the first 70 base pairs and representing the highest-quality segment of UG 100 reads—were sequenced by the UG 100 despite not being necessary for its protocol and were subsequently removed prior to analysis. Additionally, because the UG 100 generates only forward (R1) reads, the comparison with Illumina data was limited to R1 reads.

Results:

UG 100 base quality scores are slightly lower than Illumina. Comparing base quality scores between the two platforms reveals that while the UG 100 scores are marginally lower than those obtained from Illumina (approximate Q-score of 30-32 vs. 34-36), this difference is not anticipated to affect downstream data analyses. 

Table of library preps
Figure 1. Base quality scores averaged across all bases of all reads in a sample for read 1 in Illumina vs. UG 100 sequencing. In these plots each dot is one of the samples in the library pool, but many dots overlap because there was little variability in mean quality. 

The percentage of unique reads was slightly higher for UG 100 sequencing for every sample. Percent reads aligning to the reference genome were significantly higher for UG 100 sequencing, with percent Illumina around 60-95% and UG 100 around 85-98%. Reads from the Illumina sequencer were 151 bp in length, whereas UG reads varied in length from approximately 50 to 285 bp. The variable length UG 100 reads may explain differences in the percent of unique reads and the percent of reads aligning to the reference genomes.

Gene expression counts between platforms are highly correlated. Comparisons of gene expression counts between the UG 100 and NovaSeq 6000 datasets revealed strong correlations. The sample with the highest correlation, represented in the plot on the right below, showed an r-value of 0.96, while the sample with the lowest correlation, on the left, still exhibited a high r-value of 0.892.

 The sample with the highest correlation, represented in the plot on the right below, showed an r-value of 0.96, while the sample with the lowest correlation, on the left, still exhibited a high r-value of 0.892.
Figure 2. Samples with the smallest (left) and largest (right) r-values. Reads per million (RPM)-transformed gene counts were log10 transformed after adding a pseudocount equal to half of the smallest RPM-transformed value. Pearson correlation tests were performed between these transformed gene abundances detected by both sequencing platforms. 

The data demonstrate a high correlation in expression counts between the two platforms. We can see from the data that single-end reads, instead of having paired-end reads, do not significantly impact the gene expression counts. 

Pearson correlation (R) metrics for all samples
Figure 3. Pearson correlation statistics (R) between paired samples sequenced using Illumina and UG 100 sequencers. Reads per million (RPM)-transformed gene counts were log10 transformed after adding a pseudocount equal to half of the smallest RPM-transformed value. Pearson correlation tests were performed between these transformed gene abundances detected by Illumina vs. UG 100 for every sample.

Conclusions: 

Single-end variable-length sequencing on the Ultima UG 100 sequencer produces comparable results to single-end 151 bp Illumina NovaSeq 6000 sequencing. Read quality was slightly higher with Illumina sequencing in this study, but if cDNA were prepared using a native UG 100 RNA library prep kit that does not add Illumina adapters, UG 100 quality scores might improve, given that adapter sequences would not need to be trimmed from the high-quality beginnings of reads. While it is not advisable to switch sequencing methods mid-project, conducting research using the UG 100 sequencer is expected to produce results comparable to Illumina NovaSeq 6000 sequencing, possibly with more aligned reads as a result of variable read length sequencing.

References:

  1. UMGC full benchmarking study with supplementary material
  2. UMGC GenoFest Lightning Talks: UG 100 RNA-Seq Performance (x.500 accessible)
  3. UMGC UG 100 Sequencing Service