Identifying long non-coding RNA in non-model organisms poses a major challenge due to the lack of accurate transcript models in current annotations. These non-model organism transcriptomes are compiled mostly from short read RNAseq data. Due to transcript assembly limitations, short read data can only produce abstract transcript models. Since most lncRNA prediction tools use coding potential as a criteria, full length accurate transcript models are crucial for lncRNA detection. Thus abstract models can result in a large number of false positives. Using long read sequencing can bypass these issues and open up new strategies for lncRNA discovery.
We have developed methods for identifying putative lncRNA genes using long read sequencing. From these methods we have found that current long read technology can make considerable contributions to annotating lncRNA in non-model organisms.
A workshop presented at the The Plant and Animal Genome XXIV Conference 2016