Reputation: 4618
I have the following function:
def levenshtein(seq1, seq2):
size_x = len(seq1) + 1
size_y = len(seq2) + 1
matrix = np.zeros ((size_x, size_y))
matrix[: , 0] = np.arange(size_x)
matrix[0, :] = np.arange(size_y)
for x in range(1, size_x):
for y in range(1, size_y):
if seq1[x-1] == seq2[y-1]:
matrix [x,y] = min(
matrix[x-1, y] + 1,
matrix[x-1, y-1],
matrix[x, y-1] + 1
)
else:
matrix [x,y] = min(
matrix[x-1,y] + 1,
matrix[x-1,y-1] + 1,
matrix[x,y-1] + 1
)
return (matrix[size_x - 1, size_y - 1])
And I want to apply it to many pairs of string, in order to do it as fast as possible I want to remove the for loops in it and replace them by some vectorization, but I couldn't find a good way to do it, any ideas?
Upvotes: 0
Views: 2604
Reputation:
It is better to use already written python mudule to solve your problem rather than reinventing the wheel, as for me. You will save a lot of time.
Open cmd
and write pip install python-Levenshtein
, or if you use git go to your project folder and type git clone https://github.com/ztane/python-Levenshtein.git
(github link). Then onen python file and:
import Levenshtein
Levenshtein.distance('Levenshtein', 'Lenvinsten')
# output will be 4
# ... your code ...
But if you need to write it manually you can see how it is written by other developers or examples of using Levenshtein module in the same link.
Upvotes: 3