Protein-coding RNAs represent only a small fraction of the transcriptional output in higher eukaryotes. The remaining RNA species encompass a broad range of molecular functions and regulatory roles, a consequence of the structural polyvalence of RNA polymers. Albeit several classes of small noncoding RNAs are relatively well characterized, the accessibility of affordable high-throughput sequencing is generating a wealth of novel, unannotated transcripts, especially long noncoding RNAs (lncRNAs) that are derived from genomic regions that are antisense, intronic, intergenic, and overlapping protein-coding loci. Parsing and characterizing the functions of noncoding RNAs-lncRNAs in particular-is one of the great challenges of modern genome biology.
Here researchers from the Garvan Institute of Medical Research discuss concepts and computational methods for the identification of structural domains in lncRNAs from genomic and transcriptomic data. In the first part, they briefly review how to identify RNA structural motifs in individual lncRNAs. In the second part, they describe how to leverage the evolutionary dynamics of structured RNAs in a computationally efficient screen to detect putative functional lncRNA motifs using comparative genomics.
Representation of RNA secondary structure predictions for single sequences
(a) An RNA base-pairing probability matrix representing both the minimum free energy structure prediction (below the diagonal) and suboptimal base-pairing probabilities (above the diagonal) of a serine tRNA that forms five helices. The RNA sequence of interest is displayed on the X and Y axes, where each dot represents possible base pairings between bases (x,y). The size of the dots is indicative of the frequency (or probability) of the base pairings in a Boltzmann ensemble of suboptimal structures, as calculated by McCaskill’s partition function algorithm in the Vienna RNA package. The base pairs forming the validated biological structure (b) are highlighted in blue and numbered accordingly, whereas the unpaired bases forming the anticodon are highlighted in green. (c) The MFE prediction forms a structure that is quite divergent to the actual tRNA, although the biological structure is perceptible in the suboptimal base pairings. (d) A base-pairing probability matrix generated by the RNAplfold algorithm on a ~400 nt section of the 3′ end of the NEAT1 lncRNA. Locally stable base pairings are displayed as described for (a), however the sequence is represented on the diagonal (i.e., the upper quadrant of (b) is rotated 45°). In the lower left, the bases associated to the base pairs (dots) are highlighted in blue. In the lower right, the tRNA-like structure at the 3′ end of NEAT1 is highlighted in red