Export Phenyx results

From GBWiki

Revision as of 16:46, 6 January 2009 by Pab (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Contents

Help

This page describes the use and content of the different exports that can be generated from the page "exports" in the Management Console.

Link to the User Manual

http://phenyx.vital-it.ch//docs/pwi/ManagementConsole.html#954868


To access the "exports" page

  • Go to the Management Console page > Jobs Management > export/browse files

To create a Phenyx job archive

  • Select a job ID
  • Select 'job directory archive' (tar.gz or zip format)
  • Click on the 'export' button

False discovery rate excel export

  • Generates an excel report in 2 pages
  • The first page contains a table with 4 columns:
    • zscore
    • number of peptide matches in the defined forward databank (true)
    • number of peptide matches in the defined decoy databank (false)
    • ratio (column 3/column 2)
  • The second page contains a list of peptide matches with a flag indicating
    • if the match is in the decoy databank (random=TRUE)
    • if the match is in the forward databank (random=FALSE).

You can select one or more job Ids as input, in case for instance where you have performed one search on a forward databank and a separate run on a decoy databank.

  • Options for the export:

1) The search was run on two separate databanks, one forward and one decoy.

--dbtrue=uniprot_sprot --dbfalse=uniprot_sprot_rev

where dbtrue is the forward, dbfalse is the decoy. The name of the databanks is the name as found in the Submission page.

2) The search was run on a concatenated version of a forward and decoy databank.

--acregexpfalse='DECOY_.+'

you can define a regular expression that describes what entries are to be considered as decoy. In this case all entries AC named DECOY_nnnnn where nnnnn varies are selected as decoy


Excel worksheet global export

The Excel report contains 4 worksheets.

1) summary worksheet

  • This page reflects the information found on the Proteins Overview page, first left panel.
  • Lines 1 up to 7 displays information about the job.
  • From line 10, information about the results is displayed in columns.
Columns A (Databank), B (AC) and C (ID)
describes the protein match, with respectively the used Databank, the Accession number and the Identification code.
Column D (Score)
displays protein score (AC score).
Column E (#valid pept seq)
displays the number of validated unique peptide sequences on the protein (note that a peptide sequence could be validated by several peptides). This number will directly influence the protein coverage.
Column F (#valid pept)
displays the number of valid peptides on the protein (by contrast to column E, redundant valid peptides are considered).
Column G (#pept)
displays the total number of identified peptides (valid and non-valid).
Column H (% Cov)
reports the % coverage of the protein by the validated non-redundant peptides.
Column I (Description)
provides protein description.

2) parameters worksheet

  • Lines 1 up to 7 display information about the job.
  • From line 10, information on used parameters is given.
Taxonomy
number refers to the NCBI Taxonomy ID, see the NCBI Taxonomy Browser here

3) DB entries worksheet

  • This page gives complete information about protein matches (main proteins and subset proteins)
Column A (Databank)
displays Databank information.
Column B (AC)
displays Accession number (AC) information.
Column C (orig AC)
displays the original AC in case of matching an uniprot_sprot “secondary accession number”. Original AC information (“primary accession number”) has to be cited in publications.
Column D (ID)
gives the Identification code.
Column E (Score)
displays protein score (AC score).
Column F (#valid pept seq)
displays the number of validated unique peptide sequences on the protein (note that a peptide sequence could be validated by several peptides). This number will directly influence the protein coverage.
Column G (#valid pept)
displays the number of valid peptides on the protein (by contrast to column F, redundant valid peptides are considered).
Column H (#pept)
displays the total number of identified peptides (valid and non-valid).
Column I (% Cov)
reports the % coverage of the protein by the validated non-redundant peptides.
Column J (pI)
reports theoretical calculated protein isoelectric point (pI).
Column K (Mass)
reports theoretical calculated protein mass.
Column L (key)
gives unique identity for protein match (key).
Column M (Subset of)
displays information on the main protein match, in case the matched protein is subset of a main protein.
Column N (Contains)
lists the proteins that are subset of the matched protein (in case the matched protein is a main protein)
Column O (Description)
gives protein description.

4) MSMS Matches worksheet

  • This page displays information on all the matched peptides.
Column A (Compound Description)
gives description of the compound (spectrum) that allows the match.
Column B (Validity)
shows the validity that is automatically given to the peptide match regarding the acceptance parameters/threshold defined to perform the search.
Column C (Valid status)
reports user validation status. This status is noted “true” if valid (+), “false” if non-valid (-). If the user do not change the validation status through Manual Validation, columns B and C display identical information.
Column D (Sequence)
displays peptide sequence.
Column E (Modifs)
reports amino acid modifications on the peptide sequence. Modification name is given with amino acid position in brackets.
Column F (z)
gives the charge state of the theoretical peptide.
Column G (m/z)
reports the observed mass-to-charge (m/z) ratio of the parent ion.
Column H (delta m/z)
reports the delta mass-to-charge. The value is the difference between the theoretical m/z of the matched peptide and the observed m/z of the parent ion.
Column I (z-score)
gives peptide score.
Column J (intensity)
reports peak intensity for the precursor if present in the datafile.
Column K (charges)
lists the charges associated to the compound by the spectrometer.
Column L (acquTime)
reports acquisition time if available.
Column M (DB matches keys)
lists the proteins that contain the peptide.
Column N (key)
gives unique identity for the peptide match.

TPP pepxml export

This export generates a pepXML-formatted file from a selected job ID.

  • Currently available for jobs submitted with peaklists in mgf and mzxml format.

MCP Excel worksheet global export

This export generates a excel export that supports international standards. Default arguments are:

--selectedpm --pwiurl=http://phenyx.vital-it.ch/pwi --pm_minhit_xjobs=2 --pm_minhit_injob=2

where:

  1. --selectedpm, considers only Phenyx validated peptides
  2. --pwiurl=... ,is set to retrieve http links in the excel sheet
  3. --pm_minhit_xjobs=number, defines the minimal number of peptide sequences that makes a protein selected through different jobs
  4. --pm_minhit_injob=number, defines the minimal number of peptide sequences that makes a protein valid in a job

Calibrated mgf export

The calibrated mgf export creates a new peaklist file where the peaks are "corrected" using the median of the deviations.

Basically you can visualize what it does by generating the Error distribution report (pdf) export, and look at the graphs on the right hand side of the report. This would correspond to the "new" values.

The tool is not going through a highly complex recalibration procedure. It has been designed for simple shifting of masses, mainly to reduce the search space in miscalibrated low-resolution instruments such as traps of older QTofs.


Remote export

for example, to export for msight (script is pidres2msight.pl) the job 55892, you must get:

http://phenyx.vital-it.ch/cgi/cgiUserJob.pl?jobid=55892&action=export&exportcommand=msight&showForm=0

NB: of course your web client (java HttpClient, curl, wget etc.) must have save the login cookies

Personal tools
Create a book