Plant Stress RNA-seq Nexus Tutorial

Plant Stress RNA-seq Nexus: a stress-specific transcriptome database in plant cells

The framework of the database construction in the PSRN database

Research and development laboratories

The main interface of the database

The web function in the analysis results panel (Right panel)

Tab of DE coding transcripts (differentially expressed protein-coding transcripts)

Tab of DE IncRNAs (differentially expressed protein-coding transcripts)

Tab of mRNA-lncRNA coexpression network

The example applications of PSRN


Plant Stress RNA-seq Nexus: a stress-specific transcriptome database in plant cells  

Abiotic and biotic stresses are the most harmful factors affecting the growth and productivity of crops and leading to food crisis worldwide. The transcriptome profiling of various environmental stress conditions can provide insights into the molecular mechanisms of stress response in plant. RNA Sequencing (RNA-Seq) is a revolutionary tool that has been used extensively in plant research, which provides higher resolution than microarray technology and it can provide coding-transcript profiling and long non-coding RNA (lncRNA) profiling. Recently, global transcriptome profiling analysis using RNA-seq has been reported to identify differentially expressed lncRNAs, coding genes and alternatively spliced isoforms in response to environmental stresses, such as salt, heat, cold, drought, light, ozone, excessive boron, and pathogen infection.

Although many RNA-seq datasets are publically available in plants, the large-scale RNA-seq database has not been reported yet in plants. In this proposal, we constructed the first large-scale plant RNA-seq database, the plant stress RNA-seq nexus (PSRN, http://syslab5.nchu.edu.tw/PSRN) that achieves the following unprecedented features: (i) large-scale and comprehensive data archives and analyses, including coding-transcript profiling, long non-coding RNA (lncRNA) profiling; (ii) phenotype-oriented data organization and searching and (iii) the visualization of expression profiles, differential expression. Figure 1 shows the framework of the database construction. In the PSRN database, we systematically collected stress-specific RNA-seq datasets from Sequence Read Archive (SRA) and NCBI Gene Expression Omnibus (GEO). It resulted in 12 plant species and 26 RNA-seq datasets including 133 subsets and 1237 samples. Each dataset has several stress-specific subsets, and each subset contained a group of RNA-seq samples with specific stress condition, e.g. salt, heat cold, drought, and light. We manually created subsets and then assigned samples to subsets according to the description of the datasets and samples. To identify stress-specific differentially expressed transcripts (DETs) in each dataset, we selected the subsets with at least three samples, and then performed t-test between two subsets without overlap samples. It resulted in 259 stress-specific subset pairs.

The framework of the database construction in the PSRN database

Figure 1. The framework of the database construction in PSRN. The plants stress RNA-Seq datasets were collected from NCBI GEO and SRA, and then we classified all samples into phenotype-specific subsets for each dataset. In RNA-seq data processing, Bowtie2 and eXpress were used to calculate isoform expressions with references collected from Phytozome, Ensembl Plants, and PopGenIE. Finally, we calculated the log2 scale T-test and FDR between two subsets belong the same dataset, and then constructed the user interface for PSRN.


Research and development laboratories


The main interface of the database

As shown in Figure 2, the PSRN database provides a tree structure in the species and subset panel, which facilitates searching and browsing stress-specific subsets. When users select a subset, the associated subset pairs are subsequently listed in the subset-pair panel. When users select a subset pair, the web server shows the detailed description of dataset and subsets.

The PSRN database provides a tree structure in the plant species, which facilitates searching and browsing related species and stresses subsets.
Step 1. Plant species dataset panel (upper left): Users select a plant species and stress-specific subset, the associated subset pairs are subsequently listed in the subset-pair panel. The hierarchical menu illustrates which subsets are associated with which species. A subset is a group of RNA-seq samples associated with a specific stress type or genotype, e.g. 3 weeks old C24 Arabidopsis rosette treated with 350 ppb ozone exposure for 2hr, or Fe-deficient (1 mM) Chlamydomonas reinhardtii.

Step 2. Subset-pair panel (bottom left): When users select a subset in the first panel, PSRN shows all associated subset pairs in this panel.

Step 3. Profile panel (right): after selecting a subset pair in the second panel, PSRN displays a detailed description of the dataset and the RNA-Seq analysis results simultaneously. The RNA-Seq analysis results are described further in next section.

Figure 2. Screenshots of the web interface of PSRN. The PSRN user interface consists of three major panels as follows: (1) Species and subset panel: users can browse plant species and phenotype-specific subsets. (2) Subset-pair panel: given the phenotype-specific subset, users can select an associated subset pair. (3) Expression profile panel: subset-pair information and differential expressed protein-coding transcripts were shown in the panel and sorted by significance level.

The web function in the analysis results panel (Right panel)

As shown in Figure 3, In the right panel, there are three tab panels as follows:
(i)        DE coding transcripts: it shows the expression profiles of differentially expressed (DE) protein-coding transcripts.
(ii)        DE lncRNAs: it shows the expression profiles of DE lncRNAs.
(iii)        lncRNA coexpression network: we performed the correlation analysis of expression profiles between coding transcripts and lncRNAs. When selecting top 10/15/20/25 connections, the tab panel shows the coexpression network using the most significant negative correlation between coding transcripts and lncRNAs.

Figure 3. In the RNA-Seq analysis results panel, there are three analysis tabs: 

(i) DE coding transcripts: it visualizes the expression profiles of differentially expressed (DE) protein-coding transcripts.

(ii) DE lncRNAs: it visualizes the expression profiles of DE lncRNAs.

(iii) mRNA-lncRNA coexpression network: when searching gene and lncRNA in this tab panel, it visualizes the coexpression network using the most significant negative and positive correlations between coding transcripts and lncRNAs.

 

Tab of DE coding transcripts (differentially expressed protein-coding transcripts)

Given a subset pair, PSRN visualizes the expression profiles of all differentially expressed (DE) protein-coding transcripts, sorted by the significance level (p-value or FDR) between two subsets. As shown in Figure 4, when users use Search in the expression panel, they can input a transcript ID, KEGG Orthology/name, or RefSeq ID into the auto-complete field that provides quickly searching and selecting the partially matched terms. Furthermore, by entering more characters, it will filter down the list to better matches. At last, PSRN generates the expression profiles of all isoforms in the search result. In addition, users can download the result generated from PSRN as a file with PDF format. Moreover, users can select P value or FDR threshold and regulation types to identify significantly differential expressed transcripts.

Figure 4. The functions of the DE coding transcripts. Users can filter DE transcripts by selecting the Up regulation, Down regulation, or Up/Down regulation. Additionally, users can filter the results based on either p-value or FDR threshold. The interface displays the rank, the p-value, the FDR, the average expression values, the transcript ID, and the KEGG and Refseq annotation for each transcript. The colors along the color gradient line indicate 0.5 standard deviations of the expression value (FPKM).

Tab of DE IncRNAs (differentially expressed protein-coding transcripts)

Given a subset pair in Arabidopsis thaliana and Oryza sativa, PSRN visualizes the expression profiles of DE lncRNAs sorted by the significance level between two subsets. All functions, e.g. Search, regulation types, and to significant threshold, are similar to DE coding transcripts and Figure 4.

Tab of mRNA-lncRNA coexpression network

As shown in Figure 5, given a subset pair in Arabidopsis thaliana and Oryza sativa, and then PSRN visualizes an mRNA-lncRNA co-expression network, displaying coding transcripts and lncRNAs with significant correlations between coding transcripts and lncRNAs.

Figure 5. mRNA-lncRNA coexpression network. The lncRNA regulatory network function in A. thaliana and O. sativa presents the regulatory network according to the correlation of expressions between lncRNAs and protein-coding transcripts. Users can input a gene symbol and an lncRNA name, and then select a correlation threshold. To select the significant connections between coding transcripts and lncRNAs, users can set the option of the top N connections (N = 10, 15, 20 or 25), and then the web server selects the most significant N connections between the given coding transcript and lncRNA. Chartreuse circle: lncRNAs; Orange circle: protein coding transcript isoforms; Red line: positive correlation; Aqua line: negative correlation.

The example applications of PSRN

To demonstrate the biological importance of PSRN, we used cold-related transcripts and HAB1 isoforms as examples.

(i) In GSE54680 dataset, the top 5 significant upregulated transcripts in 10ºC subset are AT3G50970.1 (Low Temperature-Induced 30, LTI30, XERO2), AT4G14690.1 (Early Light-Inducible Protein 2, ELIP2), AT1G09350.1 (Galactinol Synthase 3, GOLS3, ATGOLS3), AT5G52310.1 (Low- Temperature-Induced 78, LTI78, COR78), and AT1G20440.1 (Cold-Regulated 47, COR47, ATCOR47) (Figure 6A). As their names, LTI30, LTI78 and COR47 are up-regulated by cold stress. LTI30, a late embryogenes abundant protein (LEA)/dehydrin, can bind and protect membranes to enhance freezing stress resistance in Arabidopsis (Puhakainen et al, 2004). ELIP 2, a thylakoidal protein, was not only induced by cold but also by UVB and high light for protective photoinhibition under stresses (Hayami et al, 2015). Raffinose family oligosaccharides (RFO) were identified involving in tolerance to drought, high salinity and cold stresses. GOLS catalyzes the first step in the biosynthesis of RFO. In Arabidopsis, AtGOLS1 and 2 were induced by drought and high-salinity stresses, by contrast, AtGOLS3 was induced by cold stress only.

(ii) In GSE66737 dataset, we used HAB1 gene, a group A protein phosphatases 2C (PP2C), to demonstrate the biological importance on isoform expressions in PSRN (Figure 6B). HAB1 is a co-receptor of abscisic acid (ABA) and important negative regulator of ABA signaling. HAB1 gene has three alternatively spliced isoforms: HAB1.1 (AT1G72770.1), HAB1.2 (AT1G72770.2), and HAB1.3 (AT1G72770.3) in Arabidopsis. HAB1.1 interacts with the Open Stomata1 (OST1) and inhibits its kinase activity switching the ABA signaling off. In contrast, HAB1.2 encodes a nonfunctional truncate protein, thereby keeping ABA signaling on. Thus, accurately regulating the HAB1.1 and HAB1.2 ratio is important for fine-tuning of ABA signaling and plant adaptation to stress. An RNA-binding motif (RBM)-containing protein RBM 25 is known as a critical key regulator of HAB1 isoforms through binding to last intron of HAB1 pre-mRNA in Arabidopsis. A loss-of-function mutation in RBM25, rbm25-1, resulted in an increase of the ratio HAB1.2/HAB1.1 and ABA-hypersensitive phenotypes (Zhan et al., 2015). As shown in Figure 4, HAB1.1 (At1g72770.1) is significantly down-regulated while HAB1.2 (At1g72770.2) is significantly up-regulated in rbm25-1 mutation seedlings, which is consistent with the previous reports.

Figure 6. (A) The expression of cold-related transcripts of the rosette of Arabidopsis thaliana treated with two temperature condition: 10 and 16 ºC. The top 5 significant upregulated transcripts in 10 ºC subset are AT3G50970.1, AT4G14690.1, AT1G09350.1, AT5G52310.1, and AT1G20440.1 (P-value = 4.41E-78 ~ 3.09E-57; FDR = 1.68E-73 ~ 2.34E-53). According to TAIR database, AT3G50970.1, AT5G52310.1, and AT1G20440.1 are Low Temperature-Induced 30 (LTI30, XERO2), Low Temperature-Induced 78 (LTI78, COR78), and Cold-Regulated 47 (COR47, ATCOR47), respectively. In addition, AT4G14690.1 is Early Light-Inducible Protein 2 (ELIP2), and AT1G09350.1 is Galactinol Synthase 3 (GOLS3, ATGOLS3). Limited to the layout width, only parts of the expression profiles are shown.
(B) The expression of HAB1 isoforms of the seedlings treated with ABA in Arabidopsis thaliana. The group A PP2C (protein phosphatases 2C) HAB1 gene has three transcript isoforms: HAB1.1 (AT1G72770.1), HAB1.2 (AT1G72770.2), and HAB1.3 (AT1G72770.3). Based on the P-value calculated by PSRN, HAB1 isoform AT1G72770.1 is significantly down-regulated (P-value = 9.33E-03) in rbm25-1 mutation seedlings compared with the wild type, while AT1G72770.2 is significantly up-regulated (P-value = 2.01E-3) in rbm25-1 mutation seedlings.