epitopepredict package¶
Submodules¶
epitopepredict.analysis module¶
epitopepredict analysis methods Created September 2013 Copyright (C) Damien Farrell This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-
epitopepredict.analysis.
align_blast_results
(df, aln=None, idkey='accession', productkey='definition')[source]¶ Get gapped alignment from blast results using muscle aligner.
-
epitopepredict.analysis.
create_nmers
(df, genome, length=20, seqkey='translation', key='nmer', how='split', margin=0)[source]¶ - Get n-mer peptide surrounding a set of sequences using the host
- protein sequence.
Parameters: - df – input dataframe with sequence name and start/end coordinates
- genome – genome dataframe with host sequences
- length – length of nmer to return
- seqkey – column name of sequence to be processed
- how – method to create the n-mer, split will try to split up the sequence into overlapping n-mes of length is larger than size center will center the peptide
- margin – do not split sequences below length+margin
Returns: pandas Series with nmer values
-
epitopepredict.analysis.
dbscan
(B=None, x=None, dist=7, minsize=4)[source]¶ Use dbscan algorithm to cluster binder positions
-
epitopepredict.analysis.
epitope_conservation
(peptides, alnrows=None, proteinseq=None, blastresult=None, blastdb=None, perc_ident=50, equery='srcdb_refseq[Properties]')[source]¶ Find and visualise conserved peptides in a set of aligned sequences. :param peptides: a list of peptides/epitopes :param alnrows: a dataframe of previously aligned sequences e.g. custom strains :param proteinseq: a sequence to blast and get an alignment for :param blastresult: a file of saved blast results in plain csv format :param equery: blast query string
Returns: Matrix of 0 or 1 for conservation for each epitope/protein variant
-
epitopepredict.analysis.
find_clusters
(binders, dist=None, min_binders=2, min_size=12, max_size=50, genome=None, colname='peptide')[source]¶ Get clusters of binders for a set of binders. :param binders: dataframe of binders :param dist: distance over which to apply clustering :param min_binders: minimum binders to be considered a cluster :param min_size: smallest cluster length to return :param max_size: largest cluster length to return :param colname: name for cluster sequence column
Returns: a pandas Series with the new n-mers (may be longer than the initial dataframe if splitting)
-
epitopepredict.analysis.
find_conserved_peptide
(peptide, recs)[source]¶ Find sequences where a peptide is conserved
-
epitopepredict.analysis.
find_conserved_sequences
(seqs, alnrows)[source]¶ Find if sub-sequences are conserved in given set of aligned sequences :param seqs: a list of sequences to find :param alnrows: a dataframe of aligned protein sequences
Returns: a pandas DataFrame of 1 or 0 values for each protein/search sequence
-
epitopepredict.analysis.
get_AAcontent
(df, colname, amino_acids=None)[source]¶ Amino acid composition for dataframe with sequences
-
epitopepredict.analysis.
get_orthologs
(seq, db=None, expect=1, hitlist_size=400, equery=None, email='')[source]¶ Fetch orthologous sequences using remote or local blast and return the records as a dataframe.
Parameters: - seq – sequence to blast
- db – the name of a local blast db
- expect – expect value
- equery – Entrez Gene Advanced Search options, (see http://www.ncbi.nlm.nih.gov/books/NBK3837/)
Returns: blast results in a pandas dataframe
-
epitopepredict.analysis.
get_overlaps
(df1, df2, label='overlap', how='inside')[source]¶ Overlaps for 2 sets of sequences where the positions in host sequence are stored in each dataframe as ‘start’ and ‘end’ columns
Parameters: - df1 – first set of sequences, a pandas dataframe with columns called start/end or pos
- df2 – second set of sequences
- label – label for overlaps column
- how – may be ‘any’ or ‘inside’
Returns: First DataFrame with no. of overlaps stored in a new column
-
epitopepredict.analysis.
net_charge
(df, colname)[source]¶ Net peptide charge for dataframe with sequences
-
epitopepredict.analysis.
peptide_properties
(df, colname='peptide')[source]¶ Find hydrophobicity and net charge for peptides
-
epitopepredict.analysis.
prediction_coverage
(expdata, binders, key='sequence', perc=50, verbose=False)[source]¶ Determine hit rate of predictions in experimental data by finding how many top peptides are needed to cover % positives :param expdata: dataframe of experimental data with peptide sequence and name column :param binders: dataframe of ranked binders created from predictor :param key: column name in expdata for sequence
Returns: fraction of predicted binders required to find perc total response
epitopepredict.app module¶
epitopepredict.base module¶
MHC prediction base module for core classes Created November 2013 Copyright (C) Damien Farrell This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-
class
epitopepredict.base.
BasicMHCIPredictor
(data=None, scoring=None)[source]¶ Bases:
epitopepredict.base.Predictor
Built-in basic MHC-I predictor. Should be used as a fallback if no other predictors available.
-
predict
(peptides, allele='HLA-A*01:01', name='temp', **kwargs)[source]¶ Encode and predict peptides with saved regressor
-
predict_peptides
(peptides, **kwargs)[source]¶ Override so we can call train models before predictions.
-
-
class
epitopepredict.base.
DataFrameIterator
(files)[source]¶ Bases:
object
Simple iterator to get dataframes from a path out of memory
-
class
epitopepredict.base.
DummyPredictor
(data=None, scoring=None)[source]¶ Bases:
epitopepredict.base.Predictor
Returns random scores. Used for testing
-
class
epitopepredict.base.
IEDBMHCIIPredictor
(data=None)[source]¶ Bases:
epitopepredict.base.Predictor
Using IEDB MHC-II method, requires tools to be installed locally
-
predict
(sequence=None, peptides=None, length=15, overlap=None, show_cmd=False, allele='HLA-DRB1*01:01', method='IEDB_recommended', name='', **kwargs)[source]¶ Use IEDB MHC-II python module to get predictions. Requires that the IEDB MHC-II tools are installed locally. A sequence argument is provided since the cmd line only accepts whole sequence to be fragmented.
-
-
class
epitopepredict.base.
IEDBMHCIPredictor
(data=None, method='IEDB_recommended')[source]¶ Bases:
epitopepredict.base.Predictor
Using IEDB tools method, requires iedb-mhc1 tools. Tested with version 2.17
-
predict
(sequence=None, peptides=None, length=11, overlap=1, allele='HLA-A*01:01', name='', method=None, show_cmd=False, **kwargs)[source]¶ Use IEDB MHCI python module to get predictions. Requires that the IEDB MHC tools are installed locally :param sequence: a sequence to be predicted :param peptides: a list of arbitrary peptides instead of single sequence
Returns: pandas dataframe
-
-
class
epitopepredict.base.
MHCFlurryPredictor
(data=None, **kwargs)[source]¶ Bases:
epitopepredict.base.Predictor
Predictor using MHCFlurry for MHC-I predictions. Requires you to install the python package mhcflurry with dependencies. see https://github.com/hammerlab/mhcflurry
-
predict
(peptides=None, overlap=1, show_cmd=False, allele='HLA-A0101', name='', **kwargs)[source]¶ Uses mhcflurry python classes for prediction
-
-
class
epitopepredict.base.
NetMHCIIPanPredictor
(data=None)[source]¶ Bases:
epitopepredict.base.Predictor
netMHCIIpan v3.0 predictor
-
class
epitopepredict.base.
NetMHCPanPredictor
(data=None, scoring='affinity')[source]¶ Bases:
epitopepredict.base.Predictor
netMHCpan 4.1b predictor see http://www.cbs.dtu.dk/services/NetMHCpan/ Default scoring is affinity predictions. To get newer scoring behaviour pass scoring=’ligand’ to constructor.
-
class
epitopepredict.base.
Predictor
(data=None)[source]¶ Bases:
object
Base class to handle generic predictor methods, usually these will wrap methods from other modules and/or call command line predictors. Subclass for specific functionality
-
evaluate
(df, key, value, operator='<')[source]¶ Evaluate binders less than or greater than a cutoff. This method is called by all predictors to get binders
-
get_allele_cutoffs
(cutoff=0.95)[source]¶ Get per allele percentile cutoffs using precalculated quantile vales.
-
get_binders
(cutoff=0.95, cutoff_method='default', path=None, name=None, drop_columns=False, limit=None, **kwargs)[source]¶ Get the top scoring binders. If using default cutoffs are derived from the pre-defined percentile cutoffs for some known antigens. For per protein cutoffs the rank can used instead. This will give slightly different results. :param path: use results in a path instead of loading at once, conserves memory :param cutoff: percentile cutoff (default), absolute score or a rank value within each sequence :param cutoff_method: ‘default’, ‘score’ or ‘rank’ :param name: name of a specific protein/sequence
Returns: binders above cutoff in all alleles, pandas dataframe
-
get_global_rank
(score, allele)[source]¶ Get an allele specific score percentile from precalculated quantile data.
-
load
(path=None, names=None, compression='infer', file_limit=None)[source]¶ Load results from path or single file. See results_from_csv for args.
-
plot
(name, **kwargs)[source]¶ Use module level plotting.mpl_plot_tracks method for predictor plot :param name: :param n: min no. of alleles to be visible :param perc: percentile cutoff for score :param cutoff_method: method to use for cutoffs
-
predict
(sequence=None, peptides=None, length=9, overlap=1, allele='', name='')[source]¶ Does the actual scoring of a sequence. Should be overriden. Should return a pandas DataFrame
-
predict_peptides
(peptides, threads=1, path=None, overwrite=True, name=None, **kwargs)[source]¶ Predict a set of individual peptides without splitting them. This is a wrapper for _predict_peptides to allow multiprocessing. :param peptides: list of peptides :param alleles: list of alleles to predict :param drop_columns: only keep default columns
Returns: dataframe with results
-
predict_sequences
(recs, alleles=[], path=None, verbose=False, names=None, key='locus_tag', seqkey='translation', threads=1, **kwargs)[source]¶ Get predictions for a set of proteins over multiple alleles that allows running in parallel using the threads parameter. This is a wrapper for _predictSequences with the same args.
- Args:
- recs: list or dataframe with sequences path: if provided, save results to this file threads: number of processors key: seq/protein name key seqkey: key for sequence column length: length of peptide to split sequence into
- Returns:
- a dataframe of predictions over multiple proteins
-
prepare_data
(result, name, allele)[source]¶ Put raw prediction data into DataFrame and rank, override for custom processing. Can be overriden for custom data.
-
promiscuous_binders
(binders=None, name=None, cutoff=0.95, cutoff_method='default', n=1, unique_core=True, limit=None, **kwargs)[source]¶ Use params for getbinders if no binders provided? :param binders: can provide a precalculated list of binders :param name: specific protein, optional :param value: to pass to get_binders :param cutoff_method: ‘rank’, ‘score’ or ‘global’ :param cutoff: cutoff for get_binders (rank, score or percentile) :param n: min number of alleles :param unique_core: removes peptides with duplicate cores and picks the most :param limit: limit the number of peptides per protein, default None :param promiscuous and highest ranked, used for mhc-II predictions:
Returns: a pandas dataframe
-
ranked_binders
(names=None, how='median', cutoff=None)[source]¶ Get the median/mean rank of each binder over all alleles. :param names: list of protein names, otherwise all current data used :param how: method to use for rank selection, ‘median’ (default), :param ‘best’ or ‘mean’,: :param cutoff: apply a rank cutoff if we want to filter (optional)
-
save
(prefix='_', filename=None, compression=None)[source]¶ Save all current predictions dataframe with some metadata :param prefix: if writing to a path, the prefix name :param filename: if saving all to a single file :param compression: a string representing the compression to use, :param allowed values are ‘gzip’, ‘bz2’, ‘xz’.:
-
-
class
epitopepredict.base.
TEpitopePredictor
(data=None, **kwargs)[source]¶ Bases:
epitopepredict.base.Predictor
Predictor using TepitopePan QM method
-
epitopepredict.base.
clean_sequence
(seq)[source]¶ clean a sequence of invalid characters before prediction
-
epitopepredict.base.
compare_predictors
(p1, p2, by='allele', cutoff=5, n=2)[source]¶ Compare predictions from 2 different predictors. :param p1, p2: predictors with prediction results for the same :param set of sequences andalleles: :param by: how to group the correlation plots
-
epitopepredict.base.
get_coords
(df)[source]¶ Get start end coords from position and length of peptides
-
epitopepredict.base.
get_overlapping
(index, s, length=9, cutoff=25)[source]¶ Get all mutually overlapping kmers within a cutoff area
-
epitopepredict.base.
get_predictor
(name='tepitope', **kwargs)[source]¶ Get a predictor object using it’s name. Valid predictor names are held in the predictors attribute.
-
epitopepredict.base.
get_quantiles
(predictor)[source]¶ Get quantile score values per allele in set of predictions. Used for making pre-defined cutoffs. :param predictor: predictor with set of predictions
-
epitopepredict.base.
plot_summary_heatmap
(p, kind='default', name=None)[source]¶ Plot heatmap of binders using summary dataframe.
-
epitopepredict.base.
read_defaults
()[source]¶ Get some global settings such as program paths from config file
-
epitopepredict.base.
reshape_data
(pred, peptides=None, name=None, values='score')[source]¶ Create summary table per binder/allele with cutoffs applied. :param pred: predictor with data :param cutoff: percentile cutoff :param n: number of alleles
-
epitopepredict.base.
results_from_csv
(path=None, names=None, compression='infer', file_limit=None)[source]¶ Load results for multiple csv files in a folder or a single file. :param path: name of a csv file or directory with one or more csv files :param names: names of proteins to load :param file_limit: limit to load only the this number of proteins
-
epitopepredict.base.
set_netmhcpan_cmd
(path=None)[source]¶ Setup the netmhcpan command to point directly to the binary. This is a workaround for running inside snaps. Avoids using the tcsh script.
-
epitopepredict.base.
split_peptides
(df, length=9, seqkey='sequence', newcol='peptide')[source]¶ Split sequences in a dataframe into peptide fragments
epitopepredict.cluster module¶
epitopepredict.config module¶
epitopepredict config Created March 2016 Copyright (C) Damien Farrell This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-
epitopepredict.config.
check_options
(opts)[source]¶ Check for missing default options in dict. Meant to handle incomplete config files
-
epitopepredict.config.
create_config_parser_from_dict
(data=None, sections=['base', 'iedbtools'], **kwargs)[source]¶ Helper method to create a ConfigParser from a dict of the form shown in baseoptions
epitopepredict.neo module¶
Command line script for neo epitope prediction Created March 2018 Copyright (C) Damien Farrell
-
class
epitopepredict.neo.
NeoEpitopeWorkFlow
(opts={})[source]¶ Bases:
object
Class for implementing a neo epitope workflow.
-
epitopepredict.neo.
check_ensembl
(release='75')[source]¶ Check pyensembl ref genome cached. Needed for running in snap
-
epitopepredict.neo.
combine_wt_scores
(x, y, key)[source]¶ Combine mutant peptide and matching wt/self binding scores from a set of predictions. Assumes both dataframes were run with the same alleles. :param x,y: pandas dataframes with matching prediction results :param key:
-
epitopepredict.neo.
dataframe_to_vcf
(df, outfile)[source]¶ Write a dataframe of variants to a simple vcf file. Dataframe requires the following columns: #CHROM’,’POS’,’ID’,’REF’,’ALT’
-
epitopepredict.neo.
effects_to_pickle
(effects, filename)[source]¶ serialize variant effects collections
-
epitopepredict.neo.
fetch_ensembl_release
(path=None, release='75')[source]¶ Get pyensembl genome files
-
epitopepredict.neo.
find_matches
(df, blastdb, cpus=4, verbose=False)[source]¶ Get similarity measures for peptides to a self proteome. Does a local blast to the proteome and finds most similar matches. These can then be scored. :param df: dataframe of peptides :param blastdb: path to protein blastdb
Returns: ‘sseq’,’mismatch’ Return type: dataframe with extra columns
-
epitopepredict.neo.
get_closest_match
(x)[source]¶ Create columns with closest matching peptide. If no wt peptide use self match. vector method
-
epitopepredict.neo.
get_closest_matches
(df, verbose=False, cpus=1)[source]¶ Find peptide similarity metrics
-
epitopepredict.neo.
get_mutant_sequences
(variants=None, effects=None, reference=None, peptides=True, drop_duplicates=True, length=11, verbose=False)[source]¶ Get mutant proteins or peptide fragments from vcf or maf file. :param variants: varcode variant collection :param effects: non-synonmymous effects, alternative to variants :param peptides: get peptide fragments around mutation
Returns: pandas dataframe with mutated peptide sequence and source information
-
epitopepredict.neo.
get_variants_effects
(variants, verbose=False, gene_expression_dict=None)[source]¶ Get all effects from a list of variants. :returns: list of varcode variant effect objects
-
epitopepredict.neo.
load_variants
(vcf_file=None, maf_file=None, max_variants=None)[source]¶ Load variants from vcf file
-
epitopepredict.neo.
make_blastdb
(url, name=None, filename=None, overwrite=False)[source]¶ Download protein sequences and a make blast db. Uses datacache module.
-
epitopepredict.neo.
peptides_from_effect
(eff, length=11, peptides=True, verbose=False)[source]¶ Get mutated peptides from a single effect object. :returns: dataframe with peptides and variant info
-
epitopepredict.neo.
predict_binding
(df, predictor='netmhcpan', alleles=[], verbose=False, cpus=1, cutoff=0.95, cutoff_method='default')[source]¶ Predict binding scores for mutated and wt peptides (if present) from supplied variants.
Parameters: - df – pandas dataframe with peptide sequences, requires at least 2 columns ‘peptide’ - the mutant peptide ‘wt’ - a corresponding wild type peptide
- data could be generated from get_mutant_sequences or from an external program (this) –
- predictor – mhc binding prediction method
- alleles – list of alleles
Returns: dataframe with mutant and wt binding scores for all alleles
-
epitopepredict.neo.
run_vep
(vcf_file, out_format='vcf', assembly='GRCh38', cpus=4, path=None)[source]¶ Run ensembl VEP on a vcf file for use with pvacseq. see https://www.ensembl.org/info/docs/tools/vep/script/index.html
-
epitopepredict.neo.
score_peptides
(df, rf=None)[source]¶ Score peptides with a classifier. Returns a prediction probability.
-
epitopepredict.neo.
variants_from_csv
(csv_file, sample_id=None, reference=None)[source]¶ Variants from csv file.
Parameters: - csv_file – csv file with following column names- chromosome, position, reference_allele, alt_allele, gene_name, transcript_id, sample_id
- sample_id – if provided, select variants only for this id
- reference – ref genome used for variant calling
epitopepredict.peptutils module¶
Module implementing peptide sequence/structure utilities. Created March 2013 Copyright (C) Damien Farrell This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-
epitopepredict.peptutils.
compare_anchor_positions
(x1, x2)[source]¶ Check if anchor positions in 9-mers are mutated
-
epitopepredict.peptutils.
create_fragments
(protfile=None, seq=None, length=9, overlap=1, quiet=True)[source]¶ generate peptide fragments from a sequence
-
epitopepredict.peptutils.
create_random_peptides
(size=100, length=9)[source]¶ Create random peptide structures of given length
-
epitopepredict.peptutils.
create_random_sequences
(size=100, length=9)[source]¶ Create library of all possible peptides given length
-
epitopepredict.peptutils.
get_AAfraction
(seq, amino_acids=None)[source]¶ Get fraction of give amino acids in a sequence
-
epitopepredict.peptutils.
get_AAsubstitutions
(template)[source]¶ - Get all the possible sequences from substituting every AA
- into the given sequence at each position. This gives a total of
- 19 by n amino acid positions.
epitopepredict.plotting module¶
epitopepredict plotting Created February 2016 Copyright (C) Damien Farrell This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-
epitopepredict.plotting.
binders_to_coords
(df)[source]¶ Convert binder results to dict of coords for plotting
-
epitopepredict.plotting.
bokeh_pie_chart
(df, title='', radius=0.5, width=400, height=400, palette='Spectral')[source]¶ Bokeh pie chart
-
epitopepredict.plotting.
bokeh_plot_bar
(preds, name=None, allele=None, title='', width=None, height=100, palette='Set1', tools=True, x_range=None)[source]¶ Plot bars combining one or more prediction results for a set of peptides in a protein/sequence
-
epitopepredict.plotting.
bokeh_plot_grid
(pred, name=None, width=None, palette='Blues', **kwargs)[source]¶ Plot heatmap of binding results for a predictor.
-
epitopepredict.plotting.
bokeh_plot_sequence
(preds, name=None, n=2, cutoff=0.95, cutoff_method='default', width=1000, color_sequence=False, title='')[source]¶ Plot sequence view of binders
-
epitopepredict.plotting.
bokeh_plot_tracks
(preds, title='', n=2, name=None, cutoff=0.95, cutoff_method='default', width=None, height=None, x_range=None, tools=True, palette='Set1', seqdepot=None, exp=None)[source]¶ Plot binding predictions as parallel tracks of blocks for each allele. This uses Bokeh. :param title: plot title :param n: min alleles to display :param name: name of protein to show if more than one in data
Returns: a bokeh figure for embedding or displaying in a notebook
-
epitopepredict.plotting.
get_seq_from_binders
(P, name=None)[source]¶ Get sequence from binder data. Probably better to store the sequences in the object?
-
epitopepredict.plotting.
get_seqdepot_annotation
(genome, key='pfam27')[source]¶ Get seqdepot annotations for a set of proteins in dataframe.
-
epitopepredict.plotting.
plot_bars
(P, name, chunks=1, how='median', cutoff=20, color='black')[source]¶ Bar plots for sequence using median/mean/total scores. :param P: predictor with data :param name: name of protein sequence :param chunks: break sequence up into 1 or more chunks :param how: method to calculate score bar value :param perc: percentile cutoff to show peptide
-
epitopepredict.plotting.
plot_bcell
(plot, pred, height, ax=None)[source]¶ Line plot of iedb bcell results
-
epitopepredict.plotting.
plot_binder_map
(P, name, values='rank', cutoff=20, chunks=1, cmap=None)[source]¶ Plot heatmap of binders above a cutoff by rank or score. :param P: predictor object with data :param name: name of protein to plot :param values: data column to use for plot data, ‘score’ or ‘rank’ :param cutoff: cutoff if using rank as values :param chunks: number of plots to split the sequence into
-
epitopepredict.plotting.
plot_heatmap
(df, ax=None, figsize=(6, 6), **kwargs)[source]¶ Plot a generic heatmap
-
epitopepredict.plotting.
plot_multiple
(preds, names, kind='tracks', regions=None, genome=None, **kwargs)[source]¶ Plot results for multiple proteins
-
epitopepredict.plotting.
plot_overview
(genome, coords=None, cols=2, colormap='Paired', legend=True, figsize=None)[source]¶ Plot regions of interest in a group of protein sequences. Useful for seeing how your binders/epitopes are distributed in a small genome or subset of genes. :param genome: dataframe with protein sequences :param coords: a list/dict of tuple lists of the form {protein name: [(start,length)..]} :param cols: number of columns for plot, integer
-
epitopepredict.plotting.
plot_regions
(coords, ax, color='red', label='', alpha=0.6)[source]¶ Highlight regions in a prot binder plot
-
epitopepredict.plotting.
plot_seqdepot
(annotation, ax)[source]¶ Plot sedepot annotations - replace with generic plot coords track
-
epitopepredict.plotting.
plot_tracks
(preds, name, n=1, cutoff=0.95, cutoff_method='default', regions=None, legend=False, colormap='Paired', figsize=None, ax=None, **kwargs)[source]¶ Plot binders as bars per allele using matplotlib. :param preds: list of one or more predictors :param name: name of protein to plot :param n: number of alleles binder should be found in to be displayed :param cutoff: percentile cutoff to determine binders to show
epitopepredict.sequtils module¶
Sequence utilities and genome annotation methods Created November 2013 Copyright (C) Damien Farrell This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-
epitopepredict.sequtils.
blast_sequences
(database, seqs, labels=None, **kwargs)[source]¶ Blast a set of sequences to a local or remote blast database
Parameters: - database – local or remote blast db name ‘nr’, ‘refseq_protein’, ‘pdb’, ‘swissprot’ are valide remote dbs
- seqs – sequences to query, list of strings or Bio.SeqRecords
- labels – list of id names for sequences, optional but recommended
Returns: pandas dataframe with top blast results
Check genbank tags to make sure they are not empty
-
epitopepredict.sequtils.
clustal_alignment
(filename=None, seqs=None, command='clustalw')[source]¶ Align 2 sequences with clustal
-
epitopepredict.sequtils.
convert_sequence_format
(infile, outformat='embl')[source]¶ convert sequence files using SeqIO
-
epitopepredict.sequtils.
dataframe_to_fasta
(df, seqkey='translation', idkey='locus_tag', descrkey='description', outfile='out.faa')[source]¶ Genbank features to fasta file
-
epitopepredict.sequtils.
dataframe_to_seqrecords
(df, seqkey='sequence', idkey='id')[source]¶ dataframe to list of Bio.SeqRecord objects
-
epitopepredict.sequtils.
distance_tree
(filename=None, seqs=None, ref=None)[source]¶ Basic phylogenetic tree for an alignment
-
epitopepredict.sequtils.
fasta_format_from_feature
(feature)[source]¶ Get fasta formatted sequence from a genome feature
-
epitopepredict.sequtils.
fasta_to_dataframe
(infile, header_sep=None, key='locus_tag', seqkey='translation')[source]¶ Get fasta proteins into dataframe
-
epitopepredict.sequtils.
features_to_dataframe
(recs, cds=False, select='all')[source]¶ Get genome records from a biopython features object into a dataframe returns a dataframe with a row for each cds/entry. :param recs: seqrecords object :param cds: only return cds :param select: ‘first’ record or ‘all’
-
epitopepredict.sequtils.
fetch_protein_sequences
(searchterm, filename='found.fa')[source]¶ Fetch protein seqs using ncbi esearch and save results to a fasta file. :param searchterm: entrez search term :param filename: fasta file name to save results
Returns: sequence records as a dataframe
-
epitopepredict.sequtils.
genbank_to_dataframe
(infile, cds=False)[source]¶ Get genome records from a genbank file into a dataframe returns a dataframe with a row for each cds/entry
-
epitopepredict.sequtils.
get_blast_results
(filename)[source]¶ Get blast results into dataframe. Assumes column names from local_blast method. :returns: dataframe
-
epitopepredict.sequtils.
get_genes_by_location
(genome, feature, within=20)[source]¶ Gets all featues within a given distance of a gene
-
epitopepredict.sequtils.
get_identity
(aln)[source]¶ Get sequence identity of alignment for overlapping region only
-
epitopepredict.sequtils.
get_sequence
(genome, name)[source]¶ Get the sequence for a protein in a dataframe with genbank/sequence data
-
epitopepredict.sequtils.
get_translation
(feature, genome, cds=True)[source]¶ Check the translation of a cds feature
-
epitopepredict.sequtils.
index_genbank_features
(gb_record, feature_type, qualifier)[source]¶ Index features by qualifier value for easy access
-
epitopepredict.sequtils.
local_blast
(database, query, output=None, maxseqs=50, evalue=0.001, compress=False, cmd='blastp', cpus=2, show_cmd=False, **kwargs)[source]¶ Blast a local database. :param database: local blast db name :param query: sequences to query, list of strings or Bio.SeqRecords
Returns: pandas dataframe with top blast results
-
epitopepredict.sequtils.
muscle_alignment
(filename=None, seqs=None)[source]¶ Align 2 sequences with muscle
-
epitopepredict.sequtils.
needle_alignment
(seq1, seq2, outfile='needle.txt')[source]¶ Align 2 sequences with needle
-
epitopepredict.sequtils.
remote_blast
(db, query, maxseqs=50, evalue=0.001, **kwargs)[source]¶ Remote blastp. :param query: fasta file with sequence to blast :param db: database to use - nr, refseq_protein, pdb, swissprot
-
epitopepredict.sequtils.
show_alignment
(aln, diff=False, offset=0)[source]¶ - Show a sequence alignment
- Args:
- aln: alignment diff: whether to show differences
-
epitopepredict.sequtils.
show_alignment_html
(alnrows, seqs, width=80, fontsize=15, label='name')[source]¶ Get html display of sub-sequences on multiple protein alignment. :param alnrows: a dataframe of aligned sequences :param seqs: sub-sequences/epitopes to draw if present :param label: key from dataframe to use as label for sequences
Returns: html code
epitopepredict.tepitope module¶
Module that implements the TEPITOPEPan method. Includes methods for pickpocket and pseudosequence similarity calcaulation. References: [1] L. Zhang, Y. Chen, H.-S. Wong, S. Zhou, H. Mamitsuka, and S. Zhu, “TEPITOPEpan: extending TEPITOPE for peptide binding prediction covering over 700 HLA-DR molecules.,” PLoS One, vol. 7, no. 2, p. e30483, Jan. 2012. [2] H. Zhang, O. Lund, and M. Nielsen, “The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: application to MHC-peptide binding.” Bioinformatics, vol. 25, no. 10, pp. 1293-9, May 2009. Created January 2014 Copyright (C) Damien Farrell This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-
epitopepredict.tepitope.
compare
(file1, file2, alnindex, reduced=True)[source]¶ All vs all for 2 sets of sequence files
-
epitopepredict.tepitope.
compare_alleles
(alleles1, alleles2, alnindex, reduced=True, cutoff=0.25, matrix=None, matrix_name='blosum62')[source]¶ Compare 2 sets of alleles for pseudo-seq distances
-
epitopepredict.tepitope.
compare_ref
(query1, query2, ref, alnindex)[source]¶ Compare different alleles distances to reference
-
epitopepredict.tepitope.
compare_tepitope_alleles
(alnindex)[source]¶ Compare a set of alleles to Tepitope library HLAs
-
epitopepredict.tepitope.
convert_allele_names
(seqfile)[source]¶ Convert long IPD names to common form. :param fasta sequence file:
Returns: new list of seqrecords
-
epitopepredict.tepitope.
create_virtual_pssm
(allele)[source]¶ Create virtual matrix from pickpocket profile weights
-
epitopepredict.tepitope.
generate_pssm
(expdata)[source]¶ Create pssm for known binding data given a set of n-mers and binding score
-
epitopepredict.tepitope.
get_allele_pocket_sequences
(allele)[source]¶ Convenience for getting an allele pocket aas
-
epitopepredict.tepitope.
get_pockets_pseudo_sequence
(query, offset=28)[source]¶ Get pockets pseudo-seq from sequence and pocket residues. :param query: query sequence :param offset: seq numbering offset of alignment numbering to pickpocket :param residue values:
-
epitopepredict.tepitope.
get_pseudo_sequence
(query, positions=None, offset=28)[source]¶ Get non redundant pseudo-sequence for a query. Assumes input is a sequence from alignment of MHC genes.
-
epitopepredict.tepitope.
get_scores
(pssm, sequence=None, peptides=None, length=11, overlap=1)[source]¶ Score multiple fragments of a sequence in seperate fragments
-
epitopepredict.tepitope.
get_similarities
(allele, refalleles, alnindex, matrix)[source]¶ Get distances between a query and set of ref pseudo-seqs
-
epitopepredict.tepitope.
pickpocket
(pos, allele)[source]¶ - Derive weights for a query allele using pickpocket method. This uses the
- pocket pseudosequences to determine similarity to the reference. This relies on the DRB alignment present in the tepitope folder.
Parameters: - pos – pocket position
- allele – query allele
Returns: set of weights for library alleles at this position
-
epitopepredict.tepitope.
show_pocket_residues
(pdbfile)[source]¶ Test to show the pocket residues in a pdb structure
epitopepredict.tests module¶
MHC prediction unit tests Created September 2015 Copyright (C) Damien Farrell
epitopepredict.utilities module¶
Utilities for epitopepredict Created March 2013 Copyright (C) Damien Farrell
-
epitopepredict.utilities.
filter_iedb_file
(filename, field, search)[source]¶ Return filtered iedb data
-
epitopepredict.utilities.
find_files
(path, ext='txt')[source]¶ List files in a dir of a specific type
-
epitopepredict.utilities.
get_sequencefrom_pdb
(pdbfile, chain='C', index=0)[source]¶ Get AA sequence from PDB
-
epitopepredict.utilities.
read_iedb
(filename, key='Epitope ID')[source]¶ Load iedb peptidic csv file and return dataframe
epitopepredict.web module¶
epitopepredict, methods for supporting web app Created Sep 2017 Copyright (C) Damien Farrell This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-
epitopepredict.web.
column_to_url
(df, field, path)[source]¶ Add urls to specified field in a dataframe by prepending the supplied path.
-
epitopepredict.web.
create_figures
(preds, name='', kind='tracks', cutoff=5, n=2, cutoff_method='default', **kwargs)[source]¶ Get plots of binders for single protein/sequence
-
epitopepredict.web.
dataframes_to_html
(data, classes='')[source]¶ Convert dictionary of dataframes to html tables
-
epitopepredict.web.
get_file_lists
(path)[source]¶ Get list of available prediction results in the given path. Tries to check for each possible predictor.
-
epitopepredict.web.
get_predictors
(path, name=None)[source]¶ Get a set of predictors under a results path for all or a specific protein.
-
epitopepredict.web.
get_results_tables
(path, name=None, promiscuous=True, limit=None, **kwargs)[source]¶ Get binder results from a results path. :param path: path to results :param name: name of particular protein/sequence :param view: get all binders or just promiscuous
-
epitopepredict.web.
get_scrollable_table
(df)[source]¶ Return a scrollable table as a div element to be placed in web page
-
epitopepredict.web.
get_summary_tables
(path, limit=None, **kwargs)[source]¶ Get binder results summary for all proteins in path. :param path: path to results
-
epitopepredict.web.
sequence_to_html_grid
(preds, classes='', **kwargs)[source]¶ Put aligned or multiple identical rows in dataframe and convert to grid of aas as html table