csharpdev
csharpdev

Reputation: 327

Levenshtein questions

In the Levenshtein Distance algorithm, what does this line do?:

    d[i][j] = Minimum (d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1] + cost);

Although it gets the minimum of all of those values, why is cost added to the end, and why do we have + 1 at the end of each array indexer (first two parameters)?

Upvotes: 3

Views: 564

Answers (3)

mykhal
mykhal

Reputation: 19895

if you readed further, you'd know: We can give different penalty costs to insertion, deletion and substitution. We can also give penalty costs that depend on which characters are inserted, deleted or substituted.

you'd e.g. include some >0 cost value in the deletion part of the formula, if you suppose than a letter deletion makes a bigger difference than a letter replacement

Upvotes: 0

Valentin Golev
Valentin Golev

Reputation: 10095

Here is a formula:

alt text

m(a,b) = if a==b then 0 else 1

The line with "min" is the recursive formula itself, the others are non-recursive cases to which the recursion leads.

Your line is using "caching" the results in array. Try to look at it as at:

 d(i, j) = Minimum (d(i-1, j)+1, d(i, j-1)+1, d(i-1, j-1) + cost);

cost is m(a,b), and it's zero if a==b and one in the other case

Upvotes: 2

James Kolpack
James Kolpack

Reputation: 9382

From the article:

                (
                  d[i-1, j] + 1,  // deletion
                  d[i, j-1] + 1,  // insertion
                  d[i-1, j-1] + 1 // substitution
                )

The algorithm's purpose is to find the cheapest path to convert one string (or list) to another string. The "cost" for any given operation is considered in the line you've referenced. In yours, "deletes" are considered to be a cost of 1, "inserts" also 1, and "substitution" at a variable cost.

Upvotes: 1

Related Questions