Xuan d
Xuan d

Reputation: 5

how can i compute the Levenshtein distance between sentences in a text

I want to compute the Levenshtein distance between the sentences in one document. and I found a code that compute the distance in character level, but i want it to be in word-level. for instance, the output of this character level is 6, but i want it to be 1, which means only one word need to be deleted if we wanna change b to a or a to b :

a = "The patient tolerated this ."
b = "The patient tolerated ."

def levenshtein_distance(a, b):

    if a == b:
        return 0
    if len(a) < len(b):
        a, b = b, a
    if not a:
        return len(b)
    previous_row = range(len(b) + 1)
    for i, column1 in enumerate(a):
        current_row = [i + 1]
        for j, column2 in enumerate(b):
            insertions = previous_row[j + 1] + 1
            deletions = current_row[j] + 1
            substitutions = previous_row[j] + (column1 != column2)
            current_row.append(min(insertions, deletions,    substitutions))
            previous_row = current_row
    print (previous_row[-1]) 
    return previous_row[-1] 

result = levenshtein_distance(a, b)

Upvotes: 0

Views: 3839

Answers (1)

Daweo
Daweo

Reputation: 36838

I suggest to avoid reinventing wheel, you could use pylev https://pypi.org/project/pylev/ You can install it simply with executing pip install pylev command in console. Then to calculate distance using words rather than letters:

 import pylev
 a = "The patient tolerated this ."
 b = "The patient tolerated ."
 a = a.split(" ")
 b = b.split(" ")
 print(pylev.levenshtein(a,b))

Please keep in mind that this solution is case-sensitive and assumes all words are space-sheared.

Upvotes: 8

Related Questions