Reputation: 5
I want to compute the Levenshtein distance between the sentences in one document. and I found a code that compute the distance in character level, but i want it to be in word-level. for instance, the output of this character level is 6, but i want it to be 1, which means only one word need to be deleted if we wanna change b to a or a to b :
a = "The patient tolerated this ."
b = "The patient tolerated ."
def levenshtein_distance(a, b):
if a == b:
return 0
if len(a) < len(b):
a, b = b, a
if not a:
return len(b)
previous_row = range(len(b) + 1)
for i, column1 in enumerate(a):
current_row = [i + 1]
for j, column2 in enumerate(b):
insertions = previous_row[j + 1] + 1
deletions = current_row[j] + 1
substitutions = previous_row[j] + (column1 != column2)
current_row.append(min(insertions, deletions, substitutions))
previous_row = current_row
print (previous_row[-1])
return previous_row[-1]
result = levenshtein_distance(a, b)
Upvotes: 0
Views: 3839
Reputation: 36838
I suggest to avoid reinventing wheel, you could use pylev https://pypi.org/project/pylev/
You can install it simply with executing pip install pylev
command in console.
Then to calculate distance using words rather than letters:
import pylev
a = "The patient tolerated this ."
b = "The patient tolerated ."
a = a.split(" ")
b = b.split(" ")
print(pylev.levenshtein(a,b))
Please keep in mind that this solution is case-sensitive and assumes all words are space-sheared.
Upvotes: 8