Reputation: 85036
I have been using it for a project I am working on, but some of the results aren't what I would choose. For example:
When "Date" is compared to
This is what we would expect from the algorithm of course, but I'm curious if anyone knows of something out there that will give a closer match to any compared strings that have an exact match of the source string (Date)? Meaning that "Today's Date" would have a higher ranking because it has "Date" in it.
Bonus points if you can find a .NET library that implements this.
Upvotes: 4
Views: 412
Reputation: 96119
To do it properly you need some context of the use
If you trying to do an address lookup then "Nosuch STREET" might have a perfect match of "Nosuch ROAD", or in a no-fly list you want all 20 spelling of Gadaffi to match.
if you are trying to analyse how much a piece of historic text has changed with copying then you need a different algorith,
Upvotes: 0
Reputation: 49177
I think it's meant for you to tokenize the word before employing Levenshtein. As an alternative there is Jaro-Winker distance too.
There's a .net library SimMetrics which seems to cover a few alternatives.
Upvotes: 1