※ Computational resources of protein phosphorylation and dephosphorylation:
Introduction:
Protein phosphorylation and dephosphorylation are the most ubiquitous post-translational modifications (PTMs), and play important roles in signalling response, cell differentiation, apoptosis and other essential biological processes. Identification of site-specific dephosphorylated substrates is fundamental for understanding the molecular mechanisms of dephosphorylation. Besides experimental approaches, prediction of potential candidates with computational methods has also attracted great attention for its convenience and fast-speed. In this review, we present a comprehensive but brief summarization of computational resources of protein phosphorylation and dephosphorylation, including phosphorylation databases, dephosphorylation databases, prediction of dephosphorylation sites, and other tools.
We apologized that the computational studies without any web links of databases or tools will not be included in this compendium, since it's not easy for experimentalists to use studies directly. We are grateful for users feedback. Please inform Dr. Yu Xue or Cheng Han to add, remove or update one or multiple web links below.
Index:
<2> Dephosphorylation Databases
<3> Prediction of dephosphorylation sites
<4> Miscellaneous tools
<5> Implemented tools
==================================================================================
<1> Phosphorylation Databases:
1. Phospho.ELM 9.0 (PhosphoBase) : contains 8,718 experimentally verified phosphorylated proteins from different species with 3,370 tyrosine, 31,754 serine and 7,449 threonine sites (Diella, et al., 2004; Diella, et al., 2008; Dinkel, et al., 2011).
2. PhosphoSitePlus : a new version of PhosphoSite, is a web-based database to collect protein modification sites, including protein phosphorylation sites from scientific literature as well as high-throughput discovery programs. Currently, PhosphoSitePlus contains over 120,000 phosphorylation sites (Hornbeck, et al., 2012).
3. PhosphoNET : PhosphoNET presently holds data on over 950,000 known and putative phosphorylation sites (P-sites) in over 23,000 human proteins that have been collected from the scientific literature and other reputable websites. Over 19% of these phospho-sites have been experimentally validated. The rest have been predicted with a novel P-Site Predictor algorithm developed at Kinexus with academic partners at the University of British Columbia and Simon Fraser University.
4. PhosphoPep : A project to support systems biology signaling research by providing interactive interrogation of MS-derived phosphorylation data from 4 different organisms (Bodenmiller B,et al., 2011).
5. PhosPhAt 4.0 : contains information on Arabidopsis phosphorylation sites which were identified by mass spectrometry in large scale experiments from different research groups with 60,366 phospho-peptides matching to 8141 nonredundant proteins (Heazlewood, et al., 2008; Durek, et al., 2010; Zulawski, et al., 2013).
6. P(3)DB 3.5 : hosts protein phosphorylation data for 9 species from 32 experimental studies, containing 16,477 phosphoproteins, harboring 47,923 phosphosites. Centralized by these phosphorylation data, multiple related data and annotations are provided, including protein-protein interaction (PPI), gene ontology, protein tertiary structures, orthologous sequences, kinase/phosphatase classification and Kinase Client Assay (KiC Assay) data. In addition, it incorporates multiple network viewers for the above features such as PPI network, kinase-substrate network, phosphatase-substrate network, and domain co-occurrence network (Gao, et al., 2009; Yao, et al., 2012; Yao, et al., 2013).
7. UniProt : for each protein annotation, the "Amino acid modifications" in the "Sequence annotation (Features)" section collected the post-translational modification information of proteins (UniProt Consortium,et al., 2021).
8. dbPTM : an informative resource of experimental post-translational modifications (PTMs) obtained from public resources as well as manually curated MS/MS peptides associated with PTMs from research articles for investigating the substrate specificity of PTM sites and functional association of PTMs between substrates and their interacting proteins (Huang,et al., 2019; Lee, et al., 2006; Lu, et al., 2013).
9. Phospho3D 2.0 : is a database of three-dimensional structures of phosphorylation sites which stores information retrieved from the phospho.ELM database and which is enriched with structural information and annotations at the residue level (Zanzoni, et al., 2007; Zanzoni, et al., 2011).
10. PhosSNP 1.0 : a genome-wide analysis of genetic polymorphisms that influence protein phosphorylation in H. Sapiens. It was estimated that ~69.76% of nsSNPs (non-synonymous SNPs) are potential phosSNPs (Phosphorylation-related SNPs) (64, 035) in 17, 614 proteins (Ren, et al., 2010).
11. TAIR : maintains a database of genetic and molecular data for Arabidopsis thaliana. Protein data available from TAIR includes the complete protein sequence along with phosphorylation site annotations (Lamesch, et al., 2011).
12. dbPPT 1.0 : a comprehensive resource of plant protein phosphorylation that contains 82,175 phosphorylation sites in 31,012 proteins from 20 plant organisms. The phosphorylation sites in dbPPT were manually curated from the literatures, while datasets in other public databases were also integrated (Cheng, et al., 2014).
13. EPSD : a comprehensive data resource updated from two databases of dbPPT and dbPAF , which contained 82,175 p-sites of 20 plants and 483,001 p-sites of 7 animals and fungi, respectively. (Lin, et al., 2020).
14. KinBase : holds information on over 3,000 protein kinase genes found in the genomes of human, and many other sequenced genomes. (G Manning, et al., 2002).
15. BioGRID : A public database that curated genetic, chemical interactions and proteins with a large number of PTM sites (Oughtred R,et al., 2019).
16. PubMed : Comprises more than 34 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full text content from PubMed Central and publisher web sites.
17. PDB : A leading resource of structural data of biological macromolecules (Berman HM,et al., 2000).
<2> Dephosphorylation Databases:
1. DEPOD : A manually curated resource that harbors human phosphatases, their protein and non-protein substrates, dephosphorylation sites and the associated signaling pathways (Damle, et al., 2019).
2. PhosphoBase : The database PhosphoBase holds information on over 1,200 protein phosphatase genes found in the genomes of human, and many other sequenced genomes (Chen,et al., 2017).
3. iEKPD : Contained 197,348 phosphorylation regulators, including 109,912 protein kinases, 23,294 protein phosphatases and 68,748 PPBD-containing proteins in 164 eukaryotic specie (Guo,et al., 2019).
<3> Prediction of dephosphorylation sites:
1. PTP predictor : A method that utilized the k-nearest neighbor algorithm to identity the substrate sites of three protein tyrosine phosphatases based on the sequence features of dephosphorylation sites (Wu,et al., 2014). The tool is not available.
2. DephosSite : A machine learning approach for predicting the substrate dephosphorylation sites of three specific phosphatases PTP1B, SHP-1, and SHP-2 (Wang,et al., 2016). The tool is not available.
3. DephosSitePred : A SVM approach for predicting the substrate dephosphorylation sites of three specific phosphatases PTP1B, SHP-1, and SHP-2 basing on bi-profile sequence features (Jia, et al., 2017). The tool is not available.
4. DTL-DephosSite : A deep learning approach for predicting the substrate dephosphorylation sites using a Bi-LSTM deep learning architecture and transfer learning (Chaudhari,et al., 2021).
5. DephosNet : A deep learning framework known as “DephosNet”, which leverages transfer learning to enhance dephosphorylation site prediction (Yang,et al., 2023). The tool is not available.
1. GPS 6.0 : GPS 6.0 pre-trained a general model using 490,762 non-redundant p-sites in 71,407 proteins. Then, transfer learning was conducted to obtain 577 PK-specific predictors at the group, family and single PK levels, using a well-curated data set of 30,043 known site-specific kinase-substrate relations (ssKSRs) in 7041 proteins. (Xue, et al., 2023).
2. GPS 2.1 : the old version of GPS system. We renamed the tool as the Group-based Prediction System. GPS 2.1 software was implemented in JAVA and could predict kinase-specific phosphorylation sites for 408 human Protein Kinases in hierarchy (Xue, et al., 2011).
3. DOG 2.0 : prepares publication-quality figures of protein domain structures. The scale of a protein domain and the position of a functional motif/site will be precisely calculated (Ren, et al., 2009).
4. HemI : an easy-to-use tool can visualize either gene or protein expression data in heatmaps. Additionally, the heatmaps can be recolored, rescaled or rotated in a customized manner. In addition, HemI provides multiple clustering strategies for analyzing the data. Publication-quality figures can be exported directly (Deng, et al., 2014; Ning, et al., 2022.)
1. Echarts : An Open Source JavaScript Visualization Library.
2. IUPred : The web server takes a single amino acid sequence as an input and calculates the pairwise energy profile along the sequence (Dosztányi,et al., 2021).
3. 3Dmol.js : A modern, object-oriented JavaScript library for visualizing molecular data.