How Similar Are Two Words

Question

is there more accurate algorithm than "Levenshtein distance" algorithm?? http://en.wikipedia.org/wiki/Levenshtein_distance

Regexident · Accepted Answer

There is the Damerau–Levenshtein distance, which adds support for character transpositions and providing more coverage for common typos.

To get a similarity percentage for Levenshtein or Damerau-Levenshtein do something like this:

int relative_similarity = 1.0 - 1.0 / ((len(x) + len(y)) / 2) * lev(x, y); //untested

Alternatively you might want to take a look at the longest common subsequence as a metric of similarity.

Next there are

which are phonetic matching algorithms.

While Smith and its german counterpart Schmidt would turn up as quite different using the edit distance (a.k.a Levenshtein), Soundex and MEtaphone would consider them phonetically similar or even equivalent.

But without you telling us what's wrong about the pure Levenshtein distance it's hard to guess a better algorithm.

How Similar Are Two Words

Answers (1)

Related Questions