Requirements
- Python 3.9 or greater.
- Python Packages: numpy, pyteomics, pyopenms, pulp, PyYAML, matplotlib, lxml, nicegui, pywebview, tqdm. These should be installed automatically during installation.
Quick Start Guide
-
Install the package using pip:
pip install pyproteininference
-
Run the standard commandline from an idXML file
protein_inference_cli.py \ -f /path/to/target/file1.idXML \ -db /path/to/database/file.fasta \ -y /path/to/params.yaml
-
Run the standard commandline from a mzIdentML file
protein_inference_cli.py \ -f /path/to/target/file1.mzid \ -db /path/to/database/file.fasta \ -y /path/to/params.yaml
-
Run the standard commandline from a pepXML file
protein_inference_cli.py \ -f /path/to/target/file1.pepXML \ -db /path/to/database/file.fasta \ -y /path/to/params.yaml
-
Run the standard commandline tool with tab delimited results directly from percolator
protein_inference_cli.py \ -t /path/to/target/file.txt \ -d /path/to/decoy/file.txt \ -db /path/to/database/file.fasta \ -y /path/to/params.yaml
-
Specifying Parameters. The two most common parameters to change are the inference type, and the decoy symbol (for identifying decoy proteins vs target proteins). The parameters can be quickly altered by creating a file called params.yaml as follows:
parameters: inference: inference_type: parsimony identifiers: decoy_symbol: "decoy_"
The inference type can be one of:
parsimony
,peptide_centric
,inclusion
,exclusion
, orfirst_protein
. All parameters are optional, so you only need to define the ones you want to alter. Parameters that are not defined are set to default values. See here for the default parameters. -
Full Parameter Specifications See below for a full standard parameter file:
Default Parameters
parameters:
general:
export: peptides
fdr: 0.01
picker: True
tag: example_tag
xml_parser: openms
data_restriction:
pep_restriction: 0.9
peptide_length_restriction: 7
q_value_restriction: .9
custom_restriction: None
max_allowed_alternative_proteins: 50
score:
protein_score: best_peptide_per_protein
psm_score: posterior_error_prob
psm_score_type: multiplicative
identifiers:
decoy_symbol: "##"
isoform_symbol: "-"
reviewed_identifier_symbol: "sp|"
inference:
inference_type: parsimony
grouping_type: parsimonious_grouping
digest:
digest_type: trypsin
missed_cleavages: 3
parsimony:
lp_solver: pulp
shared_peptides: all
peptide_centric:
max_identifiers: 5
These parameter options are just a suggestion. Please alter these for your specifications. For full description of each parameter and all options see the in depth parameter file description
-
Run the standard commandline tool again, this time specifying the parameters as above:
protein_inference_cli.py \ -t /path/to/target/file.txt \ -d /path/to/decoy/file.txt \ -db /path/to/database/file.fasta \ -y /path/to/params.yaml
-
Running with docker
- Either Pull the image from docker hub:
docker pull thinkle12/pyproteininference:latest
- Or Build the image with the following command (After having cloned the repository):
git clone REPOSITORY_URL
cd pyproteininference
docker build -t pyproteininference:latest .
-
Run the tool, making sure to volume mount in the directory with your input data and parameters. In the case below, that local directory would be
/path/to/local/directory
and the path in the container is/data
docker run -v /path/to/local/directory/:/data \ -it hinklet/pyproteininference:latest \ python /usr/local/bin/protein_inference_cli.py \ -f /data/input_file.txt \ -db /data/database.fasta \ -y /data/parameters.yaml \ -o /data/
-
Get the commandline help via docker
docker run thinkle12/pyproteininference:latest \ python /usr/local/bin/protein_inference_cli.py --help
- Either Pull the image from docker hub: