James Tauber
James Tauber

Reputation: 3456

how to diff / align Python lists using arbitrary matching function?

I'd like to align two lists in a similar way to what difflib.Differ would do except I want to be able to define a match function for comparing items, not just use string equality, and preferably a match function that can return a number between 0.0 and 1.0, not just a boolean.

So, for example, say I had the two lists:

L1 = [('A', 1), ('B', 3), ('C', 7)]
L2 = ['A', 'b', 'C']

and I want to be able to write a match function like this:

def match(item1, item2):
    if item1[0] == item2:
        return 1.0
    elif item1[0].lower() == item2.lower():
        return 0.5
    else:
        return 0.0

and then do:

d = Differ(match_func=match)
d.compare(L1, L2)

and have it diff using the match function. Like difflib, I'd rather the algorithm gave more intuitive Ratcliff-Obershelp type results rather than a purely minimal Levenshtein distance.

Upvotes: 8

Views: 1306

Answers (2)

James Tauber
James Tauber

Reputation: 3456

I just wrote this implementation of Needleman-Wunsch and it seems to do what I want:

def nw_align(a, b, replace_func, insert, delete):

    ZERO, LEFT, UP, DIAGONAL = 0, 1, 2, 3

    len_a = len(a)
    len_b = len(b)

    matrix = [[(0, ZERO) for x in range(len_b + 1)] for y in range(len_a + 1)]

    for i in range(len_a + 1):
        matrix[i][0] = (insert * i, UP)

    for j in range(len_b + 1):
        matrix[0][j] = (delete * j, LEFT)

    for i in range(1, len_a + 1):
        for j in range(1, len_b + 1):
            replace = replace_func(a[i - 1], b[j - 1])
            matrix[i][j] = max([
                (matrix[i - 1][j - 1][0] + replace, DIAGONAL),
                (matrix[i][j - 1][0] + insert, LEFT),
                (matrix[i - 1][j][0] + delete, UP)
            ])

    i, j = len_a, len_b
    align_a = ""
    align_b = ""

    while (i, j) != (0, 0):
        if matrix[i][j][1] == DIAGONAL:
            align_a += a[i - 1]
            align_b += b[j - 1]
            i -= 1
            j -= 1
        elif matrix[i][j][1] == LEFT:
            align_a += "-"
            align_b += b[j - 1]
            j -= 1
        else: # UP
            align_a += a[i - 1]
            align_b += "-"
            i -= 1

    return align_a[::-1], align_b[::-1]

Upvotes: 8

Carl Smotricz
Carl Smotricz

Reputation: 67760

I recently ran across a discussion of an algorithm called patience diff that sounds rather simple. You could try implementing that yourself, and then of course you can have it use whatever comparison algorithm you like.

Upvotes: 0

Related Questions