This section briefly outlines fundamental information about the biological significance of protein phosphorylation and other modifications, its importance in understanding the regulation of complex biological systems (sections 1 – 5), and features of PhosphoSitePlus® (sections 6 – 18).
|Other Modification Sites|
Cellular communication, consisting of a complex and dynamic web of directed biochemical and cell biological alterations that coordinate and regulate biological systems, may be considered the ‘natural language’ of the cell. Post-translational modifications (PTMs) of proteins are indispensible elements of the semantics of cellular communication. PTMs are reversible, enzymatically-mediated modifications of specific amino acid side chains. Some common types of PTMs include phosphorylation and O-glycosylation on Ser (S) and Thr (T) residues, phosphorylation on Tyr (Y) residues, methylation on Arg (R) residues, and acetylation and ubiquitination on Lys (K) residues.
A modification site, as defined here, consists of the modified residue at the 0 position, plus the seven flanking amino acids N-terminal (positions -7 to -1) and C-terminal (positions +1 to +7) to the modification site. This flanking region may contain certain types of amino acids at specific positions relative to the modified site: the spatial combinations of these specific amino acids define the morphology of the modification site. The morphology of a modification site, like that of a word in spoken natural languages, determines its informational content. For example, DNA-damage kinases have a strong substrate preference for [ST] (Ser or Thr) followed by Q (Gln), while the anti-apoptotic kinase Akt has a strong preference for [ST] preceded by R (Arg) at positions -3 and -5. Thus, proteins that participate directly in the DNA-damage pathway might be expected to contain [st]Q motifs following DNA damage, while those that participate directly in a subset of anti-apoptotic responses may contain XRXRXX[st]X motifs ([st] indicates a modified-Ser or –Thr at position 0).
Protein modification sites can serve as context-sensitive words in the distributed language of the cell. Using another analogy, modification sites serve as nano-switches in the distributed biochemical network of the cell. These switches control the flow of information through a particular protein, or node, of the network, helping to shape dynamic biological processes including genetic silencing, cellular growth, differentiation, and apoptosis. The recognition that errors in cellular information processing can cause diseases such as cancer, diabetes and autoimmunity has focused much biomedical research on understanding the role of protein modifications in cellular communications.
It is our hope at CST that PSP, by providing a reliable and powerful resource focused on the roles of protein modifications in biological control, will accelerate the pace of discovery of basic mechanisms of cellular signaling, further our understanding of cellular regulation in health and disease, and facilitate the discovery of critical disease biomarkers and potential drug targets.
PhosphoSitePlus® (PSP), reengineered from PhosphoSite®, is an open, dynamic, continuously curated, and highly interactive systems biology resource for studying experimentally observed PTMs in the regulation of biological processes. PhosphoSite® was limited to phosphorylation. PSP, while still providing comprehensive coverage of protein phosphorylation, now includes coverage of other commonly studied PTMs including acetylation, methylation, ubiquitination, and O-glycosylation.
PSP includes critical structural and functional information about the topology, biological function and regulatory significance of specific modification sites, and powerful tools for mining and interpreting this data in the context of biological regulation, diseases, tissues, subcellular localization, protein domains, sequences, motifs, etc.
It is our hope at Cell Signaling Technology (CST) that PSP, by providing a reliable and powerful resource focused on the roles of protein modifications in biological control, will accelerate the pace of discovery of basic mechanisms of cellular signaling, further our understanding of cellular regulation in health and disease, and facilitate the discovery of critical disease biomarkers and potential drug targets.
PSP integrates both low- and high-throughput (LTP and HTP) data sources into a single reliable and comprehensive resource. Nearly 10,000 journal articles , including both LTP and HTP reports, have been manually curated by expert scientists from over 480 different journals since 2001. The three journals most represented in PSP are the Journal of Biological Chemistry, Molecular and Cellular Biology, and Oncogene. HTP sites, discovered using mass spectrometry (MS2),come from not only from published literature but also from previously unpublished data generated at CST.
CST is in a unique position to make seminal contributions to the field of protein modification proteomics: CST scientists have been at the forefront of the technological revolution in phosphoproteomics in recent years. For example, over the past seven years, CST scientists have identified at least five-times more phospho-tyrosine sites than were previously known,. Many of these sites have been published in the literature, but because of the length of time that it can take to get results published (6 – 18 months), we have chosen to make many more newly discovered sites available via PSP prior to external publication. In most of these cases, spectra are provided so that the user can evaluate to strength of the peptide and site assignments.
The data in PSP is shared with the research community in multiple ways:
The functionality of PSP is frequently modified and refined. These modifications allow it to adapt the way that data is presented to keep pace with rapidly evolving proteomic knowledge and technology. New features and expanded functionalities include:
It is known that different types of protein modifications influence each other, and that the biological meaning of protein phosphorylation cannot be deciphered without knowing the location, regulatory effects and functional interactions of other types of protein modifications. The same MS2 technology that has enabled the discovery of tyrosine phosphorylation sites,, is catalyzing the discovery of thousands of other modification types including acetylation sites, ubiquitination sites, methylation sites, etc.
In recent years, site discovery using MS2 has far out-paced that from LTP experiments. For example, in 2003, the total number of newly discovered HTP/MS2 sites curated into PSP was just over 100. In 2008, this number had grown to 37,533, a 344-fold increase over 2003. In contrast, the total number of sites newly discovered using LTP technology was 2,616 for 2003, and 4,652 for 2008, an increase of less than two-fold over the same 5-year period. It is notable that in 2003, curated LTP sites outpaced HTP/MS2 sites by over 20-fold. The tables were turned by 2008, when 8-fold more HTP/MS2 sites were curated than LTP sites.
A typical LTP paper usually focuses on just a few modification sites. When authors report newly-discovered sites in such papers, it is de rigueur to experimentally verify the existence of the new site using robust techniques such as amino acid sequencing, phospho-specific antibodies, site-directed mutagenesis, dominant-negative constructs, etc. In contrast, an MS2 record can contain thousands of site assignments and, unlike the experimental verification required in LTP papers, all site assignments derived from MS2 spectra are probabilistic by nature.
Many site assignments from MS2 data have high probability scores and are almost assuredly correct. In all too many other cases, however, the probabilities are substantially lower, making the assignments suspect. Add to this mix the frequent ambiguity at the protein level (i.e., a peptide can come from more than one protein, or from multiple isoforms of the same protein), the user MUST USE CAUTION WHEN ACCEPTING SITES THAT HAVE BEEN ASSIGNED WITH MS2 ALONE.
Two sources of information help users evaluate the strength of MS2 data presented in PSP. For sites discovered at CST, peptide spectra are provided to help the user evaluate the site assignments. Additionally, the Reference section of Modification Site Pages indicate how many of the reports are from CST MS2 experiments or from published HTP MS2 datasets. If there is only MS2 data supporting the site assignment, the user is advised to use appropriate caution when evaluating the site.
To accommodate new sites and peptides that are reported in the LTP and MS2 data, sequences are continuously updated. New sequences are searched and imported in a hierarchical fashion, first from non-redundant and relatively stable resources (UniProt KB, RefSeq, and Ensembl), then from redundant, computer assigned resources (Trembl and GenPept).
Data is systematically curated for human, mouse, and to a lesser extent, rat proteins. Data from other warm-blooded vertebrates is curated on an ad hoc basis as it is published. A few Drosophila and other invertebrate proteins that play key roles in various signaling pathways are included. The sequences surrounding modification sites from multiple species are displayed at the bottom of each protein page.
By in vivo we mean "in living cells". This includes cell lines, primary cells, and tissue. In earlier versions of PhosphoSite®, only in vivo sites were curated. Subsequent comparison of sites phosphorylated in vitro with those phosphorylated in vivo indicated that the sequence flanking in vitro sites can closely resemble in vivo sites. Thereafter, we have included in vitro sites in PSP, but mark them in datasets so that the user can distinguish them from in vivo sites. This in vitro information may help the investigator infer what kinases might control the phosphorylation of an experimentally observed site in living systems.