Reputation: 7870
I have to perform a fuzzy matching between a source string and a set of pattern strings. This matching is given by the formula
1 - D(I,P) / max(length(I),length(P))
where
Once I have found P that maximizes this score, I would like to have the mapping between the common parts of I and P
For instance: if I="sunday" and P="saturday", the mapping would be sth like a list of the following pairs:
{{0, 0}, {1, 3}, {3, 5}, {4, 6}, {5, 7}}
as the common characters are 's', 'u', 'd', 'a' and 'y'
In this wikipedia article, one can easily find an implementation to compute the levenshtein distance but it isn't completely clear to me how I could get the mapping from the matrix built during the process it described. Can anyone enlighten me?
thanks
Upvotes: 2
Views: 766
Reputation: 850
The mapping you give as an example doesn't incorporate the edit distance at all from what I can see, as it is just looking for common characters. Maybe I am misunderstanding you, but you do not need the edit distance matrix for your mapping of common characters; the only time you would look at the edit distance would be during your D(I,P) calculation to determine the highest scoring pattern string. To generate the mapping you gave as an example, it would be a simple matter of iterating through both strings to determine the character indices for identifying the pairs.
Upvotes: 2
Reputation: 798456
Start with two copies of the same array called "source" and "destination", which are the positions in the source string enumerated. Deletions remove the corresponding element from both arrays and decrement following values in the destination array. Insertions increment following values in the destination array. Then just zip both arrays and generate your map.
Upvotes: 0