pierroz
pierroz

Reputation: 7870

How to find the common parts of 2 strings when computing their levenshtein distance

I have to perform a fuzzy matching between a source string and a set of pattern strings. This matching is given by the formula 1 - D(I,P) / max(length(I),length(P))
where

Once I have found P that maximizes this score, I would like to have the mapping between the common parts of I and P

For instance: if I="sunday" and P="saturday", the mapping would be sth like a list of the following pairs:
{{0, 0}, {1, 3}, {3, 5}, {4, 6}, {5, 7}}
as the common characters are 's', 'u', 'd', 'a' and 'y'

In this wikipedia article, one can easily find an implementation to compute the levenshtein distance but it isn't completely clear to me how I could get the mapping from the matrix built during the process it described. Can anyone enlighten me?

thanks

Upvotes: 2

Views: 766

Answers (2)

don
don

Reputation: 850

The mapping you give as an example doesn't incorporate the edit distance at all from what I can see, as it is just looking for common characters. Maybe I am misunderstanding you, but you do not need the edit distance matrix for your mapping of common characters; the only time you would look at the edit distance would be during your D(I,P) calculation to determine the highest scoring pattern string. To generate the mapping you gave as an example, it would be a simple matter of iterating through both strings to determine the character indices for identifying the pairs.

Upvotes: 2

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798456

Start with two copies of the same array called "source" and "destination", which are the positions in the source string enumerated. Deletions remove the corresponding element from both arrays and decrement following values in the destination array. Insertions increment following values in the destination array. Then just zip both arrays and generate your map.

Upvotes: 0

Related Questions