editdistance’s documentation¶
It is suggested to use either the function edit_distance()
or the SequenceMatcher
class.
Functions¶

edit_distance.
edit_distance
(seq1, seq2, action_function=<function lowest_cost_action>, test=<builtin function eq>)¶ Computes the edit distance between the two given sequences. This uses the relatively fast method that only constructs two columns of the 2d array for edits. This function actually uses four columns because we track the number of matches too.

edit_distance.
edit_distance_backpointer
(seq1, seq2, action_function=<function lowest_cost_action>, test=<builtin function eq>)¶ Similar to
edit_distance()
except that this function keeps backpointers during the search. This allows us to return the opcodes (i.e. the specific edits that were used to change from one string to another). This function contructs the full 2d array (actually it contructs three of them: one for distances, one for matches, and one for backpointers).
SequenceMatcher class¶

class
edit_distance.
SequenceMatcher
(a=None, b=None, test=<builtin function eq>, action_function=<function lowest_cost_action>)¶ Similar to the
difflib
SequenceMatcher
, but uses Levenshtein/edit distance.
__init__
(a=None, b=None, test=<builtin function eq>, action_function=<function lowest_cost_action>)¶ Initialize the object with sequences a and b. Optionally, one can specify a test function that is used to compare sequence elements. This defaults to the built in
eq
operator (i.e.operator.eq()
).

__weakref__
¶ list of weak references to the object (if defined)

distance
()¶ Returns the edit distance of the two loaded sequences. This should be a little faster than getting the same information from
get_opcodes()
.

find_longest_match
(alo, ahi, blo, bhi)¶ Not implemented!

get_grouped_opcodes
(n=None)¶ Not implemented!

get_matching_blocks
()¶ Similar to
get_opcodes()
, but returns only the opcodes that are equal and returns them in a somewhat different format (i.e.(i, j, n)
).

matches
()¶ Returns the number of matches in the alignment of the two sequences. This should be a little faster than getting the same information from
get_opcodes()
.

ratio
()¶ Ratio of matches to the average sequence length.

set_seq1
(a)¶ Specify a new sequence for sequence 1, resetting cached values.

set_seq2
(b)¶ Specify a new sequence for sequence 2, resetting cached values.

set_seqs
(a, b)¶ Specify two alternative sequences – reset any cached values.

Match functions¶
These functions can be used to toggle whether we’re minimizing edits or maximizing matches.

edit_distance.
lowest_cost_action
(ic, dc, sc, im, dm, sm, cost)¶ Given the following values, choose the action (insertion, deletion, or substitution), that results in the lowest cost (ties are broken using the ‘match’ score). This is used within the dynamic programming algorithm.
 ic  insertion cost
 dc  deletion cost
 sc  substitution cost
 im  insertion match (score)
 dm  deletion match (score)
 sm  substitution match (score)

edit_distance.
highest_match_action
(ic, dc, sc, im, dm, sm, cost)¶ Given the following values, choose the action (insertion, deletion, or substitution), that results in the highest match score (ties are broken using the distance values). This is used within the dynamic programming algorithm.
 ic  insertion cost
 dc  deletion cost
 sc  substitution cost
 im  insertion match (score)
 dm  deletion match (score)
 sm  substitution match (score)