Abe Miessler
Abe Miessler

Reputation: 85036

Are there any string comparison alogrithms out there that are "better" than Levenshtein Distance?

I have been using it for a project I am working on, but some of the results aren't what I would choose. For example:

When "Date" is compared to

  1. "State" it has a lev distance of 2
  2. "Today's Date" it has a lev distance of 9

This is what we would expect from the algorithm of course, but I'm curious if anyone knows of something out there that will give a closer match to any compared strings that have an exact match of the source string (Date)? Meaning that "Today's Date" would have a higher ranking because it has "Date" in it.

Bonus points if you can find a .NET library that implements this.

Upvotes: 4

Views: 412

Answers (3)

Martin Beckett
Martin Beckett

Reputation: 96119

To do it properly you need some context of the use

If you trying to do an address lookup then "Nosuch STREET" might have a perfect match of "Nosuch ROAD", or in a no-fly list you want all 20 spelling of Gadaffi to match.

if you are trying to analyse how much a piece of historic text has changed with copying then you need a different algorith,

Upvotes: 0

Johan Sjöberg
Johan Sjöberg

Reputation: 49177

I think it's meant for you to tokenize the word before employing Levenshtein. As an alternative there is Jaro-Winker distance too.

There's a .net library SimMetrics which seems to cover a few alternatives.

Upvotes: 1

Lie Ryan
Lie Ryan

Reputation: 64827

You probably wanted to find Longest common subsequence?

Upvotes: 2

Related Questions