Skip to main content

The original "CropPAL1" website is now here.

Where do plant proteins go?

Proteins in crop plants have specific functions and locations within the plant cell. They generate or are themselves products important for human use. In order to improve crops, protein function and location must be known. Protein subcellular location is an important clue to function and also to how proteins interact within the metabolic household. Subcellular location can be determined by fluorescent protein tagging or mass spectrometry detection in subcellular purifications as well as by prediction using protein sequence features.

The compendium of crop Proteins with Annotated Locations (cropPAL) collates more than 648 data sets from previously published fluorescent tagging or mass spectrometry studies and eight pre-computed subcellular predictions for 12 different crop proteomes. Crops included are banana (musa acuminata), barley (hordeum vulgare), canola (brassica napus), maize (zea mays), potato (solanum tuberosum), rice (oryza sativa), sorghum (sorghum bicolor), soybean (glycine max), tomato (solanum lycopersicum), wheat (triticum aestivum), wine grape (vitis Vinifera) as displayed in the species choice below. The data collection including metadata for proteins and studies can be searched using the query builder below. The reciprocal BLAST allows the search for location data across all crop species as well as compares it to Arabidopsis data from SUBA4.

Find this resource useful? Please cite cropPAL (PubMed, Plant Cell Physiol).
Need large parts of the data at once? Bulk downloads available at RDA

Choose crops below then build a query with the questions below by pressing the → buttons.

Queries appear here....
... Organism is any of
... or try a quick text query:
Find proteins where the... (To start a query or add another filter to your query select a filter below and press the → button.)
Test for experimental (e.g. GFP or MS/MS) evidence of subcellular location.
... experimental location inferred by to be in
Many proteins have no experimental evidence for their subcellular location. Check what the predictors think.
... predicted location inferred by to be in
Find proteins where the... (To start a query or add another filter to your query select a filter below and press the → button.)
Search for or exclude proteins by keywords. A search will be conducted against the descriptions of proteins in the CropPAL database. The syntax of this search supports extended regular expressions (see this site for more information). Choosing matches will give you access to the match syntax of MySQL, e.g. entering +leaf –seed* in the keyword(s) box matches a description that contains leaf but that does not contain seed, seeds, or seedling etc.
... protein description keyword(s)
Filter for proteins based on various numeric data derived from sequence data. GRAVY is defined in Kyte J., Doolittle R.F.: Mol. Biol. 157:105-132(1982) "A simple method for displaying the hydropathic character of a protein" doi:10.1016/0022-2836(82)90515-0. PMID: 7108955.
... physical property of is ← Should be a number
Filter for proteins that are translated from genes on a specific chromosome or assembly (or set of scaffolds!).
... gene model is on

Search for proteins that are (or are not) in a list of Identifiers. Enter this list of Identifiers into the box below. See here for a summary of known cross references.

You can use "wildcards" with "like" and "not like" e.g. GO:%.

... EnsemblPlants identifier(s), alias or protein sequence feature is the list of: (e.g. GO:0008270, IPR017986 etc.)
Find proteins with ... (To start a query or add another filter to your query select a filter below and press the → button.)
Search for or exclude proteins by keywords. A search will be conducted against the literature titles and abstracts in the SUBA database. The syntax of this search supports extended regular expressions (see this site for more information). Choosing matches will give you access to the match syntax of MySQL, e.g. entering +leaf –seed* in the keyword(s) box matches a title/abstract that contains leaf but that does not contain seed, seeds, or seedling etc.

Reciprocal Blast

... Arabidopsis orthologs with blast match score greater than ← must be a number and Arabidopsis consensus location in
Subcellular Location:
Search for proteins that are homologous to proteins from another Crop in a certain subcellular location. The homology types included are orthologs (any gene pairwise relation where the ancestor node is a speciation event) and paralogs (any gene pairwise relations where the ancestor node is a duplication event). For more specific classifications displayed in the query results please refer to: EnsemblPlants Protein trees

EnsemblPlants Homology Tree

... any homology with identity greater than ← must be a number and homology type of organism type and has experimentally localized (by MS/MS or GFP) it in:
Find proteins where the... (To start a query or add another filter to your query select a filter below and press the → button.)
... author's name like
Select a paper by pubmed
Select a paper by author name
... year of publication of localisation studies is between and ← Should be a year
Search for or exclude proteins by keywords. A search will be conducted against the literature titles and abstracts in the SUBA database. The syntax of this search supports extended regular expressions (see this site for more information). Choosing matches will give you access to the match syntax of MySQL, e.g. entering +leaf –seed* in the keyword(s) box matches a title/abstract that contains leaf but that does not contain seed, seeds, or seedling etc.
... publication title or abstract of the localisation study the keyword(s)
... author's affiliation
... author's affiliation in
Search for proteins by author affiliation
... author's afflilation in

Find proteins where the... (To start a query or add another filter to your query select a filter below and press the → button.)
Search for proteins that match this blast. Press the ‘Clear’ button to delete any content from the box.
... Protein contains fragments in list with bit-score ...
Sequences against CropPAL database

Bit Score is log2Neff-log2(E-value) where E-value = pval × Neff is the p-value times the effective search space size. The larger the bit-score the better since pval = P(random seq having a better score) = 2-(bit-score). The p-value measures the statistical significance of the match but since we tried Neff times to find a match we need to make a correction. Multiplying by the number of possible matches gives the e-value or the expected number of hits with a better match just by random chance. (See here and here [PDF]).