Reputation: 327
In the Levenshtein Distance algorithm, what does this line do?:
d[i][j] = Minimum (d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1] + cost);
Although it gets the minimum of all of those values, why is cost added to the end, and why do we have + 1 at the end of each array indexer (first two parameters)?
Upvotes: 3
Views: 564
Reputation: 19895
if you readed further, you'd know: We can give different penalty costs to insertion, deletion and substitution. We can also give penalty costs that depend on which characters are inserted, deleted or substituted.
you'd e.g. include some >0 cost value in the deletion part of the formula, if you suppose than a letter deletion makes a bigger difference than a letter replacement
Upvotes: 0
Reputation: 10095
Here is a formula:
m(a,b) = if a==b then 0 else 1
The line with "min" is the recursive formula itself, the others are non-recursive cases to which the recursion leads.
Your line is using "caching" the results in array. Try to look at it as at:
d(i, j) = Minimum (d(i-1, j)+1, d(i, j-1)+1, d(i-1, j-1) + cost);
cost
is m(a,b)
, and it's zero if a==b
and one in the other case
Upvotes: 2
Reputation: 9382
From the article:
(
d[i-1, j] + 1, // deletion
d[i, j-1] + 1, // insertion
d[i-1, j-1] + 1 // substitution
)
The algorithm's purpose is to find the cheapest path to convert one string (or list) to another string. The "cost" for any given operation is considered in the line you've referenced. In yours, "deletes" are considered to be a cost of 1, "inserts" also 1, and "substitution" at a variable cost.
Upvotes: 1