Aligner#
- class pyfamsa.Aligner#
A single FAMSA aligner.
- scoring_matrix#
The scoring matrix used for scoring alignments.
- Type:
New in version 0.4.0: The
scoring_matrixattribute.- __init__(*, threads=0, guide_tree='sl', tree_heuristic=None, n_refinements=100, keep_duplicates=False, refine=None, scoring_matrix=None, medoid_threshold=0, medoid_seeds=100, medoid_sample=2000, medoid_evaluations=1, cluster_fraction=0.1, cluster_iters=2)#
Create a new aligner with the given configuration.
- Keyword Arguments:
threads (
int) – The number of threads to use for parallel computations. If 0 given (the default), useos.cpu_countto spawn one thread per CPU on the host machine.guide_tree (
str) – The method for building the guide tree. Supported values are:slfor MST+Prim single linkage,slinkfor SLINK single linkage,upgmafor UPGMA,njfor neighbour joining.tree_heuristic (
strorNone) – The heuristic to use for constructing the tree. Supported values are:medoidfor medoid trees,partfor part trees, orNoneto disable heuristics.n_refinements (
int) – The number of refinement iterations to run.keep_duplicates (
bool) – Set toTrueto avoid discarding duplicate sequences before building trees or alignments.refine (
boolorNone) – Set toTrueto force refinement,Falseto disable refinement, or leave asNoneto disable refinement automatically for sets of more than 1000 sequences.scoring_matrix (
ScoringMatrixorstr) – The scoring matrix to use for scoring alignments. By default, the PFAMSUM43 matrix is used, like in the C++ FAMSA implementation sincev2.3.0.medoid_threshold (
int) – The minimum number of sequences a set must contain for medoid trees to be used, if enabled withtree_heuristic.medoid_seeds (
int) – The number of trees to select for seeding the medoid trees with PartTree.medoid_sample (
int) – The number of sequences to use to perform clustering.medoid_evaluations (
int) – The number of evaluations to perform while building the medoid trees.cluster_fraction (
float) – The fraction of data points to select to estimate a guide tree with the PartTree algorithm.cluster_iters (
int) – The number of iterations to identify starting nodes while estimating a guide tree with the PartTree algorithm.
New in version 0.4.0: The
scoring_matrixargument.Changed in version 0.6.0: Default
scoring_matrixchanged from MIQS to PFASUM43.Changed in version 0.6.1:
scoring_matrixsupports alphabets subsets ofFAMSA_ALPHABET.New in version 0.7.0: The
medoid_seeds,medoid_sample,medoid_evaluations,cluster_fractionandcluster_itersarguments.
- align(sequences)#
Align sequences together.
Example
>>> aligner = Aligner() >>> seqs = [Sequence(b't1', b'MMYK'), Sequence(b't2', b'MYKLP')] >>> ali = aligner.align(seqs) >>> list(ali) [GappedSequence(b't1', b'MMYK--'), GappedSequence(b't2', b'-MYKLP')]
- Parameters:
sequences (iterable of
Sequence) – An iterable yielding the digitized sequences to align.- Returns:
Alignment– The aligned sequences, in aligned format.- Raises:
ValueError – When the given sequences contain symbols that are not supported by the
Aligner.scoring_matrix.RuntimeError – When the internal FAMSA failed to align the sequences.
Changed in version 0.6.1: Sequences are now checked against the
scoring_matrixalphabet.
- align_profiles(profile1, profile2)#
Align two profiles together.
Profile-profile alignment computes a new alignment using sequences from the two input alignments while preserving the columns of each profile.
- Parameters:
- Returns:
Alignment– The resulting profile-profile alignment.
New in version 0.5.0.
- build_tree(sequences)#
Build a tree from the given sequences.
- Parameters:
sequences (iterable of
Sequence) – An iterable yielding the digitized sequences to build a tree from.- Returns:
GuideTree– The guide tree obtained from the sequences.- Raises:
ValueError – When the given sequences contain symbols that are not supported by the
Aligner.scoring_matrix.RuntimeError – When the internal FAMSA failed to align the sequences.
Changed in version 0.6.1: Sequences are now checked against the
scoring_matrixalphabet.
- cluster_fraction#
The fraction of data points for the PartTree algorithm.
New in version 0.7.0.
- Type:
- medoid_evaluations#
The number of evaluations to perform for medoid trees.
New in version 0.7.0.
- Type:
- medoid_sample#
The number of sequences to use to perform clustering.
New in version 0.7.0.
- Type:
- medoid_seeds#
The number of trees to select for seeding the medoid trees with PartTree.
New in version 0.7.0.
- Type:
- medoid_threshold#
The minimum number of sequences for medoid trees to be used.
New in version 0.7.0.
- Type: