Global Intersection of Long Non-Coding RNAs with Processed and Unprocessed Pseudogenes in the Human Genome

Pseudogenes are abundant in the human genome and had long been thought of purely as nonfunctional gene fossils. Recent observations point to a role for pseudogenes in regulating genes transcriptionally and post-transcriptionally in human cells.

To computationally interrogate the network space of integrated pseudogene and long non-coding RNA regulation in the human transcriptome, researchers at Wayne State University have developed and implemented an algorithm to identify all long non-coding RNA (lncRNA) transcripts that overlap the genomic spans, and specifically the exons, of any human pseudogenes in either sense or antisense orientation. As inputs to their algorithm, the researchers imported three public repositories of pseudogenes: GENCODE v17 (processed and unprocessed, Ensembl 72); Retroposed Pseudogenes V5 (processed only), and Yale Pseudo60 (processed and unprocessed, Ensembl 60); two public lncRNA catalogs: Broad Institute, GENCODE v17; NCBI annotated piRNAs; and NHGRI clinical variants. The data sets were retrieved from the UCSC Genome Database using the UCSC Table Browser.

Computationally derived intersections of all pseudogene, lncRNA, cDNA, and EST databases.

File 1 GID File 2 GID Exon-exon gene IDs Sense Overlaps Antisense Overlaps Complex Loci
GENCODE 17 Pseudogenes GENCODE 17 lncRNA 163 87 56 20
GENCODE 17 Pseudogenes vs. GENCODE 17 lncRNA cDNA 64 33 4 27
GENCODE 17 Pseudogenes vs. GENCODE 17 lncRNA EST 68 4 2 62
GENCODE 17 Pseudogenes vs. GENCODE 17 lncRNA cDNA and EST 48 23 3 22
GENCODE 17 Pseudogenes Human lincRNA 870 725 45 100
GENCODE 17 Pseudogenes vs. Human lincRNA cDNA 371 216 15 140
GENCODE 17 Pseudogenes vs. Human lincRNA EST 646 53 18 575
GENCODE 17 Pseudogenes vs. Human lincRNA cDNA and EST 325 186 12 127
Retro Ali5 GENCODE 17 lncRNA 211 79 120 12
Retro Ali5 vs. GENCODE 17 lncRNA cDNA 78 35 17 26
Retro Ali5 vs. GENCODE 17 lncRNA EST 162 30 16 116
Retro Ali5 vs. GENCODE 17 lncRNA cDNA and EST 69 32 15 22
Retro Ali5 Human lincRNA 557 405 108 44
Retro Ali5 vs. Human lincRNA cDNA 129 61 26 42
Retro Ali5 vs. Human lincRNA EST 381 64 34 283
Retro Ali5 vs. Human lincRNA cDNA and EST 113 53 23 37
Yale 60 GENCODE 17 lncRNA 105 66 25 14
Yale 60 vs. GENCODE 17 lncRNA cDNA 40 25 2 13
Yale 60 vs. GENCODE 17 lncRNA EST 46 2 1 43
Yale 60 vs. GENCODE 17 lncRNA cDNA and EST 31 18 1 12
Yale 60 Human lincRNA 547 468 24 55
Yale 60 vs. Human lincRNA cDNA 192 123 6 63
Yale 60 vs. Human lincRNA EST 372 38 12 322
Yale 60 vs. Human lincRNA cDNA and EST 157 96 5 56

The researchers identified 2277 loci containing exon-to-exon overlaps between pseudogenes, both processed and unprocessed, and long non-coding RNA genes. Of these loci they identified 1167 with Genbank EST and full-length cDNA support providing direct evidence of transcription on one or both strands with exon-to-exon overlaps. The analysis converged on 313 pseudogene-lncRNA exon-to-exon overlaps that were bidirectionally supported by both full-length cDNAs and ESTs. In the process of identifying transcribed pseudogenes, the researchers generated a comprehensive, positionally non-redundant encyclopedia of human pseudogenes, drawing upon multiple, and formerly disparate public pseudogene repositories. Collectively, these observations suggest that pseudogenes are pervasively transcribed on both strands and are common drivers of gene regulation.

Venn diagram of the genomic positionally-nonredundant intersection of three major public pseudogene databases.

The resulting non-redundant dataset renders a more inclusive and comprehensive pseudogene database of 20945 pseudogene loci and alleviates problems due to accession number synonymity within and between the three databases. (Accession number synonyms point to the same pseudogene along the human genome, but in the absence of positionally-nonredundant collapsing, they may be misrepresented by downstream programs as representing multiple pseudogenes).

Milligan MJ, Harvey E, Yu A, Morgan AL, Smith DL, Zhang E, Berengut J, Sivananthan J, Subramaniam R, Skoric A, Collins S, Damski C, Morris KV, Lipovich L. (2016) Global Intersection of Long Non-Coding RNAs with Processed and Unprocessed Pseudogenes in the Human Genome. Front Genet 7:26. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*