Reputation: 3456
I'd like to align two lists in a similar way to what difflib.Differ
would do except I want to be able to define a match function for comparing items, not just use string equality, and preferably a match function that can return a number between 0.0 and 1.0, not just a boolean.
So, for example, say I had the two lists:
L1 = [('A', 1), ('B', 3), ('C', 7)]
L2 = ['A', 'b', 'C']
and I want to be able to write a match function like this:
def match(item1, item2):
if item1[0] == item2:
return 1.0
elif item1[0].lower() == item2.lower():
return 0.5
else:
return 0.0
and then do:
d = Differ(match_func=match)
d.compare(L1, L2)
and have it diff using the match function. Like difflib
, I'd rather the algorithm gave more intuitive Ratcliff-Obershelp type results rather than a purely minimal Levenshtein distance.
Upvotes: 8
Views: 1306
Reputation: 3456
I just wrote this implementation of Needleman-Wunsch and it seems to do what I want:
def nw_align(a, b, replace_func, insert, delete):
ZERO, LEFT, UP, DIAGONAL = 0, 1, 2, 3
len_a = len(a)
len_b = len(b)
matrix = [[(0, ZERO) for x in range(len_b + 1)] for y in range(len_a + 1)]
for i in range(len_a + 1):
matrix[i][0] = (insert * i, UP)
for j in range(len_b + 1):
matrix[0][j] = (delete * j, LEFT)
for i in range(1, len_a + 1):
for j in range(1, len_b + 1):
replace = replace_func(a[i - 1], b[j - 1])
matrix[i][j] = max([
(matrix[i - 1][j - 1][0] + replace, DIAGONAL),
(matrix[i][j - 1][0] + insert, LEFT),
(matrix[i - 1][j][0] + delete, UP)
])
i, j = len_a, len_b
align_a = ""
align_b = ""
while (i, j) != (0, 0):
if matrix[i][j][1] == DIAGONAL:
align_a += a[i - 1]
align_b += b[j - 1]
i -= 1
j -= 1
elif matrix[i][j][1] == LEFT:
align_a += "-"
align_b += b[j - 1]
j -= 1
else: # UP
align_a += a[i - 1]
align_b += "-"
i -= 1
return align_a[::-1], align_b[::-1]
Upvotes: 8
Reputation: 67760
I recently ran across a discussion of an algorithm called patience diff that sounds rather simple. You could try implementing that yourself, and then of course you can have it use whatever comparison algorithm you like.
Upvotes: 0