Evan Levesque
Evan Levesque

Reputation: 3213

How Similar Are Two Words

is there more accurate algorithm than "Levenshtein distance" algorithm?? http://en.wikipedia.org/wiki/Levenshtein_distance

Upvotes: 0

Views: 744

Answers (1)

Regexident
Regexident

Reputation: 29562

There is the Damerau–Levenshtein distance, which adds support for character transpositions and providing more coverage for common typos.

To get a similarity percentage for Levenshtein or Damerau-Levenshtein do something like this:

int relative_similarity = 1.0 - 1.0 / ((len(x) + len(y)) / 2) * lev(x, y); //untested

Alternatively you might want to take a look at the longest common subsequence as a metric of similarity.

Next there are

which are phonetic matching algorithms.

While Smith and its german counterpart Schmidt would turn up as quite different using the edit distance (a.k.a Levenshtein), Soundex and MEtaphone would consider them phonetically similar or even equivalent.


But without you telling us what's wrong about the pure Levenshtein distance it's hard to guess a better algorithm.

Upvotes: 4

Related Questions