Differential Expression Analysis for the Identification of Survival Associated Genes in Primary Bladder Cancer using Microarray Data

Bladder cancer (BC) is a highly malignant and malevolent type of tumor whose incidence and mortality worldwide remains high. This paper aimed to identify the differentially expressed genes (DEGs) between normal and primary bladder cancer samples, which might be of prognostic value, and evaluate their association to clinical outcomes (survival) in patients with primary bladder cancer. A sample of 256 gene expression profiles, of which 10 were normal bladder mucosae, whereas 165 were primary bladder cancer tissues, were downloaded from the gene expression omnibus (GEO). The limma package in R was used to perform screening for DEGs after variance reduction through quantile normalization and log2 transformation. Gene Ontology (GO), KEGG enrichment analysis, and Network analysis were performed to determine the functions of these DEGs. Cox proportional hazard regression model was then used to determine the association between the top selected DEGs and overall primary bladder cancer survival. Selected genes were then encoded as either low, middle, or high expression, and Kaplan-Meier plots were then used to visualize the effect of these expression levels for the selected five genes on bladder cancer survival probability. This study identified a total of 5 979 DEGs where 2 878 were upregulated, 3 101 were downregulated, and 37 169 were not significantly expressed differently. Cox proportional hazard stepwise regression models revealed several DEGs that have a statistically significant association with bladder cancer. The selected five DEGs associated with survival were EDNRB (HR=1.9858, 95%CI 1.4965–2.6350), COL14A1 (HR=1.5335, 95%C1 1.1670–2.0151), MOXD1 (HR=1.4878, 95%CI 1.1070–1.9996), FAM107A (HR=1.4970, 95%CI 1.1162–2.0077), and REM1 (HR=1.5999, 95%CI 1.2422–2.0605). After encoding these genes as either low, middle, or high expression, high expression of both EDNRB, COL14A1, FAM107A, and REM1 were found to result in statistically significant reductions in survival probabilities of patients (p<0.05). Contrastingly, high expression of MOXD1 did not contribute to a statistically significant reduction in patients' survival probability (p>0.05). The genes found in this study could provide useful insight into understanding the association between the identified DEGs and primary bladder cancer survival, which can further inform future studies and clinical therapies


Introduction
Cancer is a "genetic disorder driven by the progressive accumulation of multiple genetic and epigenetic changes" (W. J. Kim et al., 2010). These hereditary and epigenetic vicissitudes often at the molecular level result in the proliferation of cells, the spread of these cells to other parts of the organism, and depreciated cell death. The changes that occur in an individual's gene expression profile play a vital role in determining tumor biology, including such aspects as the growth of the tumor, progression, recurrence, and metastasis, all of which might have statistically significant effects on the overall patient survival probability. Additionally, changes in an individual's gene expression profile are suggestive of probable underlying gene mutations (Daneshmand, 2019;W. J. Kim et al., 2010).
Globally, based on recently published estimates, bladder cancer is among the top ten cancers that are most common in the world, with new cases estimated at 550 000 per annum (Richters et al., 2019). While an extravagant portion of the burden of this cancer falls, for the most part, in the developed countries, shifts are expected in the course of the next few decades due to the changing world demographics, including growing and aging populations in the African continent (Richters et al., 2019). Previously conducted studies have found high bladder cancer incidences in North America, Western Europe, and Southern Europe, whereas high mortalities have been reported in Northern Africa (Wong et al., 2018). Malats and Real (2015) reported high incidences in older men results which conform with the published bladder cancer estimates from the American Cancer Society (2020), where the average age at diagnosis has been reported as seventythree years, with men reporting significantly higher incidences (62, 100 new male cases against 19, 300 cases in women) and mortality than women (13, 050 male deaths against 4, 930 deaths).
Throughout the years, technological advancements have resulted in the development of novel and high throughput technologies such as gene microarray sequencing technologies that allow the analysis of several patient gene expression profiles, particularly in carcinogenesis and progression of cancers (Huang et al., 2018). Additionally, the identification of novel biomarkers of disease (cancer) progression (Botling et al., 2013;W. T. Kim et al., 2010;Sun et al., 2010) can be carried out through the use of gene expression profiles as well as clinical cases (Shukla et al., 2017;Tang et al., 2017). Over the last decade, the systematic profiling of samples from humans with cancers has been established and authenticated as an excellent method of obtaining the molecular characteristics of various tumors and their subtypes (Martinez-Romero et al., 2018).
In previously published studies on bladder cancer, evidence of the effect of genetic mutations has been reported ranging from significant associations between gene mutations and bladder cancer (Mikhailenko & Nemtsova, 2016; to non-substantial evidence (Allory et al., 2014). Additionally, some papers have reported both significant and null associations between some genetic mutations and bladder cancer (K. . While studies have tried to explore the genetic variations resulting in atypical gene expression profiles in individuals with bladder cancer, most studies have focused on differentially expressed genes (DEGs) and the identification of genetic biomarkers with little or no focus on the associations between these genes and overall cancer-specific survival in samples obtained from patients with primary bladder cancer (Aljabery et al., 2018;Gao et al., 2018;Shi & Tian, 2019). In Differential Expression Analysis for the Identification of Survival Associated Genes in Primary Bladder Cancer using Microarray Data IJURCA|2|Okutse & Nyongesa studies in which survival was considered (Buttigliero et al., 2017;Chen et al., 2019;W. J. Kim et al., 2010;Seiler et al., 2017), specific focus on the influence of these specific top differentially expressed genes on primary bladder cancer survival is either obscured or non-existent or focused on the analysis of a single gene (Theodorescu et al., 2004).
In the current paper, an analysis of differentially expressed genes in primary bladder tumor samples from microarray experiments was carried out to identify the top genes expressed differently between normal bladder samples and primary bladder samples. A statistical analysis of these genes was then carried out to identify whether these genes statistically significantly influenced overall survival in primary bladder cancer patients from whom the samples were obtained. Identifying the specific impact of the DEGs on bladder cancer survival may play a crucial role in prognosis and the discovery of novel drug targets for the disease. Additionally, this analysis may enable the discovery of the effect of low, middle, and high gene expression of the selected top prognostic genes on primary bladder cancer patient survival.

Methodology
Data source. This study sought to identify the associations between the top differentially expressed genes between healthy and primary bladder cancer patients and the association between these genes, bladder cancer prognosis, and overall survival using microarray gene expression profiles. A total of 256 samples prepared using the Illumina Human-6 expression BeadChip Version 2.0 from which 165 were primary bladder cancer samples, ten normal bladder mucosae, 23 recurrent non-muscle invasive tumor samples, and 58 normallooking bladder mucosae were downloaded from the Gene Expression Omnibus (GEO), with accession number GSE13507 (W. J. Kim et al., 2010).

Data Preprocessing. Samples from the
downloaded dataset with no clinical information were filtered out of the downloaded dataset to remain with 165 samples derived from patients with primary bladder cancer and ten healthy bladder samples with survival information. Variance reduction between the microarrays was carried out using quantile normalization (Bolstad et al., 2003;Calza et al., 2008). Gene expression values were then log2transformed to enhance normality and reduce heterogeneity (Quackenbush, 2002).

Differential gene expression analysis.
Analysis of differentially expressed genes from the selected samples was carried out using the limma package in R (Ritchie et al., 2015). Linear models for each of the genes in the given series of arrays was fit using the 'lmFit' function. The fitted microarray linear model was then used to calculate and derive modulated statistics, including t-statistics, log-odds of differential expression, and Fstatistics using the empirical Bayes statistics for differential expression (eBayes) with 0.01 as an assumed proportion of DEGs.
Upregulated and downregulated DEGs were selected based on the rule: |log(foldchange)|>0 and |log(foldchange)|<0 respectively. Similarly, the threshold, log(foldchange)=0, was used to determine DEGs that were not statistically significantly differentially expressed between normal and primary bladder cancer samples. The limma function 'topTable' was used to retrieve the top-ranked differentially expressed genes from the empirical Bayesian model. Benjamin and Hochberg (False Discovery Rate, FDR) adjustments were applied to the p-values.
Functional annotation and enrichment analysis. ClusterProfiler, an R package for Differential Expression Analysis for the Identification of Survival Associated Genes in Primary Bladder Cancer using Microarray Data IJURCA|3|Okutse & Nyongesa the analysis and visualization of the functional annotations for genes and clusters, was used to perform enrichment analyses (Yu et al., 2012). The 'enrichKEGG' function from the package was used to perform enrichment analysis based on the Kyoto Encyclopedia for Genes and Genome (KEGG). In contrast, Gene Ontology (GO) categories were returned using the 'enrichGO' function with False Discovery Rate (FDR) control. Probability values were Benjamini-Hochberg (BH) adjusted with the cut-off for the most significant pathway analysis results for KEGG and GO categories obtained using the thresholds p<0.05 and p<0.01, respectively. Visualizations of the enrichment results for both GO and KEGG were done using dot plots. Additionally, networks showing gene linkages and biological functions were constructed using the 'cnetplot' function from the package 'enrichplot' in R (Yu, 2019). These visualizations are significant in visualizing the enriched pathways and genes belonging to several annotation classes. Cox proportional hazard model for the identification of survival associated genes in primary bladder cancer. The logtransformed probe intensity data were Zscore transformed to allow comparability of the results from multiple experiments regardless of the initial hybridization concentrations after standardization (Cheadle et al., 2003). The survival package (Borgan, 2001) and the RegParallel package (Blighe, 2019) from Bioconductor were used to construct a multivariable Cox proportional hazard model and allow analogous processing over the considerably large microarray gene expression dataset, respectively.
The DEGs which satisfied the threshold LogRank<0.01 (top hits) were retrieved and annotated using the biomaRt package (Durinck et al., 2009). Attributes including platform information, Entrez gene IDs, gene biotype, and the external gene names for the survival associated genes were retrieved for further analysis. The DEGs that were associated with bladder cancer survival were prepared for downstream analysis and encoded with regards to their expression levels as either high expression (Z-score cut-off>=1.0), low expression (Z-score cut-off <=-1.0), or median expression (Z-score=0). The survminer package (Alboukadel et al., 2018) was used to construct Kaplan-Meier (KM) plots for visualizing the effect of the selected top differentially expressed genes on the overall primary bladder cancer survival.

Statistical analysis and data availability.
All statistical analysis procedures were carried out using the R statistical programming environment version 3.6.3 (https://www.rproject.org/). The packages used in this analysis are available and freely downloadable from Bioconductor (https://www.bioconductor.org/) and The Comprehensive R Archive Network (CRAN) (https://cran.r-project.org/). The dataset for this analysis and the related clinical information for the 165 primary bladder cancer samples were downloaded from the Gene Expression Omnibus (GEO) with accession number GSE13507 available at https://www.stva.ncbi.nlm.nih.gov/geo/query/acc.cgi.

Differential Expression Analysis for the Identification of Survival Associated Genes in Primary Bladder Cancer using Microarray Data
IJURCA|4|Okutse & Nyongesa  DEGs' biological functions were identified through enrichment analysis, and functional annotation was based on the GO terms. Gene Ontology enabled the exploration of the role distribution of these genes in primary bladder cancer. KEGG pathway enrichment analysis for the top 300 DEGs allowed identifying the most significant pathways involved in primary bladder cancer. GO categories are separated into three groups, namely, molecular function (MF), biological processes (BP), and cellular component (CC). The top 5 most significant GO terms under each category are summarized in Table 2. For molecular function, the most significantly enriched GO terms were receptor activity (GO:0004872, p=6.8123E-05) and carbohydrate-binding (GO:0030246, p=1.0526E-04), extracellular matrix organization for biological process (GO:0030198, p=4.0049E-05) and extracellular region for cellular component (GO:0005576, p=9.1054E-11).
After KEGG enrichment analysis, the most significant pathways were retrieved using a pvalue cut-off of 0.05. The topmost significantly enriched pathways ( Figure 3) were cell adhesion molecules (p=3.3007E-04), PI3K-Akt signaling pathway (p=0.0021), Basal cell carcinoma (p=0.0034), Pathways in cancer (p=0.0069), and Ether lipid metabolism (p=0.0099). The KEGG enrichment results are summarized in Table  3.  Network construction was based on the top 300 DEGs and was constructed using the ClusterProfiler cnetplot function. The network was constructed based on the GO enrichment analysis results and was used to visualize the links between the DEGs and their biological functions. Through the network results, genes, including LAMA2, BCL2, and FLRT2, were observed to belong to various annotation groups (Figure 2).

Figure 2:
A visual representation of the linkages of the differentially expressed genes (DEGs) and their biological functions as a network. Each node represents an enriched pathway with its size corresponding to the number of genes in that enriched pathway. The edges are linked to each of the genes and overlap where the genes belong to multiple annotation categories. Genes that belong to the same annotation category tend to be clustered around that node.

Proportional Cox hazard survival model.
Cox proportional hazard survival models were constructed to identify the roles of the Differentially Expressed Genes (DEGs) in overall primary cancer survival. Multiple survival analysis using RegParallel revealed a total of 8, 746 statistically significantly DEGs associated with survival in primary bladder cancer "Supplementary file 1: Cox proportional hazard model results." 2, 623 DEGs satisfied the set threshold LogRank<0.01 "Supplementary file 2: Reduced proportional hazard model with LogRank<0.01" and were annotated using the biomaRt package. The list of the retrieved DEGs that were associated with survival in primary bladder cancer after annotation were summarized in "Supplementary file 3: biomaRt annotated survival associated genes." Of these genes, 50 genes corresponded to those found in the top 300 differentially expressed genes identified through differential expression analysis using the limma package. These genes were extracted and summarized as presented in Table 4 and "Supplementary file 4: Top 50 survival associated genes in primary bladder cancer." The top 5 DEGs out of the 50 that this study identified to be significantly differentially expressed between normal and cancerous cell samples and as markers of overall survival in primary bladder cancer Differential Expression Analysis for the Identification of Survival Associated Genes in Primary Bladder Cancer using Microarray Data IJURCA|9|Okutse & Nyongesa were then encoded to visualize their effects using Kaplan Meier plots. The top five of these genes were EDNRB, COL14A, MOXD1, FAM107A, and REM1.
High expression of EDNRB was statistically significantly associated with reduced survival in primary bladder cancer (p=0.0003) ( Figure 3A). Additionally, the high expression of COL14A1 ( Figure 3B) and FAM107A were statistically significantly associated with a reduced survival probability in patients with primary bladder cancer (p=0.0052 and p=0.0066, respectively). Figure 3C presents the effects of low, middle, and high expression of FAM107A, respectively.
High expression of REM1 was also associated with a statistically significant reduction in the survival probability of patients with primary bladder cancer (p=0.0091; Figure 3D). Contrastingly, the high expression of the DEG MOXD1 was not associated with significant reductions in the survival probability of patients with primary bladder cancer (p=0.2200; Figure  4).

Discussion
Technological advancements over the years have made it possible for efforts geared towards the particular understanding of the genetic links between several cancers. While studies have identified biomarkers previously, there is still a need for new survival markers in primary bladder cancer. In this paper, survival associated genetic biomarkers that are differentially expressed between normal and primary bladder cancer samples have been proposed using a dataset downloaded from the GEO database. The Cox proportional hazard model was used to determine the particular genes that influence overall survival in patients with tumors of the bladder.
Analysis of gene expression profiles revealed a total of 5 979 differentially expressed genes between normal and primary bladder cancer samples. The top five upregulated genes were CDC20, PMM2, IQGAP3, and GSK3B, whereas PLPPR4, CYTL1, CD160, ANO4, and PCSK2 were the top five most downregulated genes. CDC20, which was identified as the most significantly upregulated gene, has previously been proposed as a potential, quiescent and novel ameliorative target notably in cancer doctoring (L. Z. Wang et al., 2013), having been previously obtained as a gene of interest in human bladder carcinoma (Choi et al., 2013). CDC20 was also identified in advanced distant metastasis and poor overall primary bladder cancer survival in humans (Choi et al., 2013). On the other hand, the top downregulated DEG was PLPPR4 has no prognostic role in primary bladder cancer and results from genomic mutations. CYTL1, the second-top most DEG, has been previously highlighted as being differently methylated in bladder urothelial carcinoma (X. Wang et al., 2019).    KEGG and GO enrichment analysis and functional annotation was performed to demonstrate the functions of the DEGs. The most significantly enriched KEGG pathways based on the top 300 DEGs were cell adhesion molecules (CAMs) and the P13K-Akt signaling pathway. As such, anomalous pathways can be regarded as one of the causes of bladder tumors, as CAMs have been previously linked to invasion and metastasis in human malignancies, and in particular, tumors of the genitourinary tract. CAMs are significant in the interaction between cells as well as the communication amid cells and other apparatuses of the extracellular matrix (Bryan, 2015;Cohen et al., 1997). Furthermore, the P13K-Akt signaling pathway is responsible for regulating cell growth, metastasis, and survival. Alterations in these pathway functions might result in the development of cancers (Sathe & Nawroth, 2018).
DEGs were significantly enriched in GO terms of 'extracellular matrix organization,' 'extracellular region' and 'receptor activity.' Studies have previously reported the role of these terms in bladder cancer (Brunner & Tzankov, 2007;Dozmorov et al., 2006).
The 300 of the selected top differentially expressed gene expression profiles obtained from microarray data were then subjected to proportional Cox hazard models to identify the differently expressed genes and their effect on overall primary bladder cancer survival. DEGs were retrieved based on the defined threshold defined in the previous section. The twenty of the genes selected as estimators of survival in primary bladder cancer were EDNRB, COL14A1,MOXD1,FAM107A,REM1,FBLN5,CADM3,VCAM1,PAPPA,RBP4,PI16,IL6ST,TLR1,F10,OLFML1,LAMA2,CHN1,GNG2,RHOJ,and SCN2B. Five genes, that is, EDNRB, COL14A1, MOXD1, FAM107A, REM1, were selected for further exploration.
The endothelin B receptor, also called EDNRB, is a "G protein-coupled receptor" responsible for activating the second phosphatidylinositol-calcium messenger system.
The role of EDNRB in cancer has been previously studied, with findings by Davenport et al. (2016) suggesting the downregulation of the gene by promoter hypermethylation during tumor development (tumorigenesis). The downregulation of EDNRB results in the alteration of the ET1 signaling pathway (Davenport et al., 2016). The subsequent modifications in this pathway have also been reported to result in the proliferation of tumors, metastasis, and angiogenesis (Lahav et al., 1999;Mallat et al., 1995;Zhao et al., 2009). This paper found that high expression of EDNRB results in a significant reduction in overall bladder cancer survival based on the proportional Cox hazard regression model results. This is particularly due to the impact that EDNRB has on the ET1 signaling pathway, whose alliteration results in the propagation of the growth of tumors.
Similarly, collagen, type XIV, alpha 1 (COL14A1) was found to significantly reduce the overall survival of patients with primary bladder cancer. The role of COL14A1 in cancer development has been reported elsewhere (Morris et al., 2010). In this paper, we explain its role in bladder cancer tumorigenesis and progression based on its impact on the process of DNA methylation. Methylation frequency was observed to increase with the overexpression of COL14A1 in bladder cancer. Methylation results in the addition of a methyl group to a Differential Expression Analysis for the Identification of Survival Associated Genes in Primary Bladder Cancer using Microarray Data IJURCA|15|Okutse & Nyongesa DNA molecule. This situation can result in a change in a DNA section's activity without the sequence being altered.
DNA methylation on a gene promoter contributes to the repression of gene transcription. In tumorigenesis, altered methylation patterns have been implicated. In particular, the destruction of genes involved in suppressing the growth of tumors upsets the balance that facilitates cell proliferation regulation. This situation results in the propagation of tumors into complete malevolent status resulting in reduced overall patient survival (Wajed et al., 2001). Additionally, COL14A1 has been previously reported to interact with decorin, which results in the downregulation of receptors (including IGF-I and LRP) that have been implicated in reduced survival and increased growth of cancers (Brandan et al., 2006;Schaefer et al., 2007).
Additionally, 'family with sequence similarity 107 member A' (FAM107A) was found to contribute significantly to increases in overall survival of patients with primary bladder cancer in low expression. This can be argued to result from the effect that the gene has in terms of reducing cell growth even though the role of the gene remains controversial (Nakajima et al., 2012). In studies in which the role of FAM107A was investigated, varied evidence ranging from the gene suppressing tumor development through inducing apoptosis to the gene driving tumor invasion in gliomas have been reported (Le et al., 2010;Liu et al., 2009). According to a previously conducted study in which the diagnostic value of FAM107A in discriminating bladder cancer from hematuria was investigated, high ratios of this gene were associated with bladder cancer (Xu et al., 2019).
On the other hand, the high expression of 'Monooxygenase DBH like 1' (MOXD1) did not significantly affect the overall survival probability of patients with primary bladder cancer. This gene was also found to be significantly downregulated between normal and primary bladder cancer samples. In previous studies, the knockdown of the gene MOXD1 has been reported to annihilate and suppress the spread of cancerous cells in osteosarcoma through the induction of cellular death (apoptosis) (Han et al., 2019). In this way, the role of MOXD1 in enhancing the survival of patients with bladder cancer in high expression can be argued to be a result of the induction of the death of cancerous cells that the gene promotes.
High expression of 'GTP-binding protein REM 1' was also associated with reduced survival probability in bladder cancer. The role of REM1 in cancer has been studied previously, where the gene has been linked to angiogenesis and Ca 2+ signaling. Similarly, the role that angiogenesis plays with respect to tumor growth and metamorphosis and that of calcium signaling in the growth of cancerous cells has been tackled elsewhere (Capiod et al., 2007;Folkman, 2002).
While studies have previously used the GSE13507 dataset, their findings have been quite limited and varied in terms of the genes that they reported. For instance, W. J. Kim et al. (2010) focused on identifying genes that could be used in the prediction of bladder cancer progression. The authors reported genes, including S100A8, HMOX1, MTAP, KIF1A, COCH, CELSR3, MGCIT624, and PFKFB4, as possible predictors of muscle-invasive bladder cancer progression. On the other hand, ee et al. (20) focused on E2F1 as well as its associated genes as predictors of progression of bladder cancer from superficial to invasive. In this paper, the same dataset has been used with attention paid to the effect of genes (including EDNRB, COL14A1, MOXD1, Differential Expression Analysis for the Identification of Survival Associated Genes in Primary Bladder Cancer using Microarray Data IJURCA|16|Okutse & Nyongesa FAM107A, and REM1), which have had a minimal investigation in terms of their influence on the overall survival of patients with primary bladder cancer. Patterns in the expression of these genes in primary bladder cancer that had previously been obscured are unmasked. Additionally, this study adds to the fund of knowledge on prognostic and survival genetic markers of primary bladder cancer.
In summary, this paper involved a comprehensive analysis of DEGs between normal and primary bladder cancer samples. Proportional Cox hazard regression models were then used to study the effect of these DEGs on the overall survival of patients with primary bladder cancer. The role of the selected DEGs in cancer has also been enumerated. Additionally, the function of the selected pathways in bladder cancer tumorigenesis and progression have been discussed. The genes identified in this paper are likely to inform future research aimed at understanding their genetic link to bladder cancer and overall effects on patient survival probability. Also, biological and therapeutic information can be derived from this research as gene expression analysis has been known from previous studies to inform clinical and therapeutic technologies. For instance, gene therapy, such as Gendicine TM, was made possible following findings from genetic and genomic studies (Raty et al., 2010). While this study adds substantial knowledge on survival markers for primary bladder cancer, further research should be carried out employing considerably large sample sizes to confirm the significance of these survival-associated DEGs.

Conclusion
This study identified 5 979 genes expressed differently between normal and primary bladder cancer samples derived from microarray gene expression profiling and downloaded from the Gene Expression Omnibus. 50 DEGs were identified as having a statistically significant effect on the overall survival probability of patients with primary bladder cancer. These results will provide invaluable insight for future research studies in the field, in addition to aiding clinical therapies as markers of bladder cancer survival.

Ethics and consent
Not applicable.

Competing interests
The authors have no competing interests to declare.

Author's contributions
AOO performed the study design, data collection, literature searches, statistical analysis, data interpretation, and manuscript preparation. KWN oversaw the statistical analysis, data interpretation, and manuscript review. All author's read and approved the final manuscript.