Novel tools for the prediction of promoters in plants and bacteria

  • I. A. Shahmuradov Institute ofMolecular Biology and Biotechnologies, ANAS, Azerbaijan, AZ1073, Baku, Matbuat Ave., 2a

Abstract

Aim. The computational search for promoters remains an attractive problem in bioinformatics. Despite the attention it has received for many years, the problem has not been addressed satisfactorily. These studies were aimed to develop novel computer tools for prediction of promoters (transcription start sites, TSSs) in plants and bacteria. Results. Two novel tools for prediction of RNA polymerase II promoters in plants (TSSPlant) and bacteria (bTSSfinder) have been developed. TSSPlant achieves significantly higher accuracy compared to the next best promoter prediction program for both TATA and TATA-less promoters; it is available to download as a standalone program at http://www.cbrc.kaust.edu.sa/download/. bTSSfinder predicts promoters for five classes of σ factors in Cyanobacteria (σA, σC, σH, σG and σF) and for five classes of sigma factors in E. coli (σ70, σ38, σ32, σ28 and σ24). Comparing to currently available tools, bTSSfinder achieves highest accuracy. bTSSfinder is available standalone and online at http://www.cbrc.kaust.edu.sa/btssfinder. Conclusions. To date, TSSPlant and bTSSfinder are most accurate promoter predictors in plants and bacteria, respectively.

Keywords: transcription, RNA polymerase, promoter, TSS, promoter prediction.

References

Solovyev V.V., Shahmuradov I.A., Salamov A.A. Identification of promoter regions and regulatory sites. In: Computational Biology of Transcription Factor Binding (Methods in Molecular Biology). Editor: Istvan Ladunga. Springer Science+Business Media, Humana Press, 2010, 674, Chapter 5. doi: 10.1007/978-1-60761-854-6_5

Hernandez-Garcia C.M., Finer J.J. Identification and validation of promoters and cis-acting regulatory elements. Plant Science. 2014. V. 217-218. P. 109-119. doi: 10.1016/j.plantsci.2013.12.007

Roy A.L., Singer D.S. Core promoters in transcription: old problem, new insights. Trends Biochem Sci. 2015. V. 40. P. 165-171. doi: 10.1016/j.tibs.2015.01.007

Schneider G.J., Hasekorn R. RNA polymerase subunit homology among cyanobacteria, other eubacteria and archaebacteria. J Bacteriol. 1988. V. 170. P. 4136-4140. doi: 10.1128/JB.170.9.4136-4140.1988

Imamura S., Asayama M. Sigma factors for cyanobacterial transcription. Gene Regul Syst Bio. 2009. V. 3. P. 65-87. doi: 10.4137/GRSB.S2090

Ruff E.F., Record M.T., Jr., Artsimovitch I. Initial events in bacterial transcription initiation. Biomolecules. 2015. V. 5. P. 1035-1062. doi: 10.3390/biom5021035

Wosten M.M. Eubacterial sigma-factors. FEMS Microbiol Rev. 1998. V. 22. P. 127-150. doi: 10.1111/j.1574-6976.1998.tb00364.x

Gruber T.M., Gross C.A. Multiple sigma subunits and the partitioning of bacterial transcription space. Annu Rev Microbiol. 2003. V. 57. P. 441-466. doi: 10.1146/annurev.micro.57.030502.090913

Mundade R., Ozer H.G., Wei H., Prabhu L., Lu T. Role of ChIP-seq in the discovery of transcription factor binding sites, differential gene regulation mechanism, epigenetic marks and beyond. Cell Cycle. 2014. V. 13. P. 2847-2852. doi: 10.4161/15384101.2014.949201

Suryamohan K., Halfon M.S. Identifying transcriptional cis-regulatory modules in animal genomes. Wiley Interdiscip Rev Dev Biol. 2015. V. 4. P. 59-84. doi: 10.1002/wdev.168

Levati E., Sartini S., Ottonello S., Montanini B. Dry and wet approaches for genome-wide functional annotation of conventional and unconventional transcriptional activators. Comput Struct Biotechnol J. 2016. V. 14. P. 262-270. doi: 10.1016/j.csbj.2016.06.004

Frith M.C., Valen E., Krogh A., Hayashizaki Y., Carninci P., Sandelin A.A code for transcription initiation in mammalian genomes. Genome Res. 2008. V. 18. P. 1-12. doi: 10.1101/gr.6831208

Abeel T., Van de Peer Y., Saeys Y. Toward a gold standard for promoter prediction evaluation. Bioinformatics. 2009. V. 25. P. i313-i320. doi: 10.1093/bioinformatics/btp191

Scherf M., Klingenhoff A., Werner T. Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. J. Mol. Biol. 2000. V. 297. P. 599-606. doi: 10.1006/jmbi.2000.3589

Ohler U., Niemann H. Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet. 2001. V. 17. P. 56-60. doi: 10.1016/S0168-9525(00)02174-0

Shahmuradov I.A., Solovyev V.V., Gammerman A.J. Plant promoter prediction with confidence estimation. Nucleic Acids Res. 2005. V. 33. P. 1069-1076. doi: 10.1093/nar/gki247

Azad A.K.M., Shahid S., Noman N., Lee H. Prediction of plant promoters based on hexamers and random triplet pair analysis. Algorithms for Molecular Biology. 2011. V. 6. P. 19. doi: 10.1186/1748-7188-6-19

Zuo Y-C., Li Q-Z. Identification of TATA and TATA-less promoters in plant genomes by integrating diversity measure, GC-Skew and DNA geometric flexibility. Genomics. 2011. V. 97. P. 112-120. doi: 10.1016/j.ygeno.2010.11.002

Hertz G.Z., Stormo G.D. Escherichia coli promoter sequences: analysis and prediction. Methods in enzymology. 1996. V. 273. P. 30-42. doi: 10.1016/S0076-6879(96)73004-5

Huerta A.M., Collado-Vides J. Sigma70 promoters in Escherichia coli: specific transcription in dense regions of overlapping promoter-like signals. J Mol Biol. 2003. V. 333. P. 261-278. doi: 10.1016/j.jmb.2003.07.017

Gordon L., Chervonenkis A.Y., Gammerman A.J., Shahmuradov I.A., Solovyev V.V. Sequence alignment kernel for recognition of promoter regions. Bioinformatics. 2003. V. 19. P. 1964-1971. doi: 10.1093/bioinformatics/btg265

Gordon J.J., Towsey M.W., Hogan J.M., Mathews S.A., Timms P. Improved prediction of bacterial transcription start sites. Bioinformatics. 2006. V. 22. P. 142-148. doi: 10.1093/bioinformatics/bti771

Knudsen S. Promoter2.0: for the recognition of PolII promoter sequences. Bioinformatics. 1999. V. 15. P. 356-361. doi: 10.1093/bioinformatics/15.5.356

Reese M.G. Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem. 2001. V. 26. P. 51-56. doi: 10.1016/S0097-8485(01)00099-7

Li Q.Z., Lin H. The recognition and prediction of sigma70 promoters in Escherichia coli K-12. J Theor Biol. 2006. V. 242. P. 135-141. doi: 10.1016/j.jtbi.2006.02.007

Rangannan V., Bansal M. Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition. Mol Biosyst. 2009. V. 5. P. 1758-1769. doi: 10.1039/b906535k

Solovyev V., Salamov A. Automatic annotation of microbial genomes and metagenomic sequences. In: Metagenomics and its applications in agriculture, biomedicine and environmental studies. (Ed. R.W. Li), Nova Science Publishers. 2011. P. 61-78.

Song K. Recognition of prokaryotic promoters based on a novel variable-window Z-curve method. Nucleic Acids Res. 2012. V. 40. P. 963-971. doi: 10.1093/nar/gkr795

Hieno A., Naznin H.A., Hyakumachi M., Sakurai T., Tokizawa M. et al. ppdb: plant promoter database version 3.0. Nucleic Acids Res. 2014. V. 42. P. D1188-D1192. doi: 10.1093/nar/gkt1027

Mitschke J., Vioque A., Haas F., Hess W.R., Muro-Pastor A.M. Dynamics of transcriptional start site selection during nitrogen stress-induced cell differentiation in Anabaena sp. PCC7120. Proc Natl Acad Sci USA. 2011. V. 108. P. 20130-20135. doi: 10.1073/pnas.1112724108

Vijayan V., Jain I.H., O'Shea E.K. A high resolution map of a cyanobacterial transcriptome. Genome Biol. 2011. V. 12. P. R47. doi: 10.1186/gb-2011-12-5-r47

Mitschke J., Georg J., Scholz I., Sharma C.M., Dienst D. et al. An experimentally anchored map of transcriptional start sites in the model cyanobacterium Synechocystis sp. PCC6803. Proc Natl Acad Sci USA. 2011. V. 108. P. 2124-2129. doi: 10.1073/pnas.1015154108

Kilic S., White E.R., Sagitova D.M., Cornish J.P., Erill I. CollecTF: a database of experimentally validated transcription factor-binding sites in Bacteria. Nucleic Acids Res. 2014. V. 42. P. D156-160. doi: 10.1093/nar/gkt1123

Cardon L., Stormo G. Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments. J. Mol. Biol. 1992. V. 5. P. 159-170. doi: 10.1016/0022-2836(92)90723-W

Afifi A.A., Azen S.P. Statistical analysis: a computer oriented approach. Academic Press. 2014. 384 p.

Shahmuradov I.A., Umarov R.Kh., Solovyev V.V. TSSPlant: a new tool for prediction of plant Pol II promoters. Nucleic Acids Research. 2017. Published online 13 Jan 2017. doi: 10.1093/nar/gkw1353

Shahmuradov I.A., Mohamad Razali R., Bougouffa S., Radovanovic A., Bajic V.B. bTSSfinder: a novel tool for the prediction of promoters in Cyanobacteria and Escherichia coli. Bioinformatics. 2016. Published online 30 Sep. doi: 10.1093/bioinformatics/btw629

Reese M.G., Harris N.L., Eeckman F.H. Large Scale Sequencing Specific Neural Networks for Promoter and Splice Site Recognition. Biocomputing: Proceedings of the 1996 Pacific Symposium (edited by Lawrence Hunter and Terri E. Klein), World Scientific Publishing Co, Singapore, 1996, January 2-7.

Prestridge D.S. Predicting Pol II promoter sequences using transcription factor binding sites. J Mol Biol. 1995. V. 249. P. 923-932. doi: 10.1006/jmbi.1995.0349

Abeel T., Saeys Y., Bonnet E., Rouze P., Van de Peer Y. Generic eukaryotic core promoter prediction using structural features of DNA. Genome Research. 2008. V. 18. P. 310-323. doi: 10.1101/gr.6991408