Reputation: 85126

Are there any string comparison alogrithms out there that are "better" than Levenshtein Distance?

I have been using it for a project I am working on, but some of the results aren't what I would choose. For example:

When "Date" is compared to

"State" it has a lev distance of 2
"Today's Date" it has a lev distance of 9

This is what we would expect from the algorithm of course, but I'm curious if anyone knows of something out there that will give a closer match to any compared strings that have an exact match of the source string (Date)? Meaning that "Today's Date" would have a higher ranking because it has "Date" in it.

Bonus points if you can find a .NET library that implements this.

Upvotes: 4

Answers (3)

Martin Beckett

Reputation: 96187

To do it properly you need some context of the use

If you trying to do an address lookup then "Nosuch STREET" might have a perfect match of "Nosuch ROAD", or in a no-fly list you want all 20 spelling of Gadaffi to match.

if you are trying to analyse how much a piece of historic text has changed with copying then you need a different algorith,

Upvotes: 0

Johan Sjöberg

Reputation: 49237

I think it's meant for you to tokenize the word before employing Levenshtein. As an alternative there is Jaro-Winker distance too.

There's a .net library SimMetrics which seems to cover a few alternatives.

Upvotes: 1

Lie Ryan

Reputation: 64953

You probably wanted to find Longest common subsequence?

Upvotes: 2

Are there any string comparison alogrithms out there that are &quot;better&quot; than Levenshtein Distance?

Answers (3)

Related Questions

Are there any string comparison alogrithms out there that are "better" than Levenshtein Distance?