Reputation: 2092
In the typical dynamic Levenshtein distance algorithm, to compute the value of cell d[i][j]
, where i
and j
are the row and column numbers, respectively, we take the minimum of d[i-1][j-1]+0/1
, d[i-1][j]+1
and d[i][j-1]+1
. However, it seems to me that the minimum of d[i-1][j-1]+0/1
and d[i-1][j]+1
is always going to be d[i-1][j-1]+0/1
, in which case including d[i-1][j]+1
in the computation seems redundant. Is it ever the case that d[i-1][j-1]+0/1
> d[i-1][j]+1
in the Levenshtein distance algorithm, and if not, wouldn't it be more efficient to omit this comparison?
EDIT: Sorry for the under-researched question; any standard run of the algorithm shows instances where d[i-1][j-1]+0/1
> d[i-1][j]+1
:
A
+-+-+
|0|1|
+-+-+
A|1|0|
+-+-+
(consider the second row).
Upvotes: 0
Views: 133
Reputation: 17595
Referring to the Wikipedia Article, the minimum in the last case must be taken in the "deletion" case.
Suppose we want to calculate the Levenshtein distance between abc
and ab
(from now on fixed and omitted from notation).
Iterative evaluation yields the following intermediate results.
lev(0,0) = 0 (1st case applies)
lev(0,1) = 1 (1st case applies)
lev(0,2) = 2 (1st case applies)
lev(1,0) = 1 (1st case applies)
lev(1,1) = min(2,2,0) (2nd case, minimum taken in last term) = 0
lev(1,2) = min(1,2,1) (2nd case, minumum taken in last term) = 1
lev(2,0) = 2 (1st case applies)
lev(2,1) = min(3,1,2) (2nd case, minimum taken in second term) = 1 (*)
lev(2,2) = min(2,2,0) (2nd case, minimum taken in the last term) = 0
lev(3,0) = 3 (1st case applies)
lev(3,1) = min(4,2,2) (2nd case, minimum taken in the second and third term) = 2
lev(3,2) = min(3,1,2) (2nd case, minimum taken in the second term) = 1 (*)
The row marked with (*) are cases where the second case occurs, but the minimum is not taken in the last term. An online calculator also showing the dynamic programming table can be found here.
Upvotes: 1