jon bondy
jon bondy

Reputation: 421

Need a routine to detect strings that are similar but not identical

I have a list of strings, some of which have been modified since my previous release. Some of the changes are trivial (spacing, off by one word, etc). I would like to detect strings that have only "minor" differences, so that I can try to use the older translations if at all possible.

What do I mean by "minor differences"? I will not know until I start working with the database.

DO you know of any tunable routines that will indicate when two strings are similar but not identical? Any routines that will return a number indicating how different two strings are?

Upvotes: 11

Views: 1551

Answers (1)

user267885
user267885

Reputation:

There are many such algorithms. Keywords are fuzzy string matching.

A well known one is a Levenshtein distance. By it you can calculate the number of "changes" required to transform one string into another, so that gives you an estimate of how similar the strings are.

See also this question: How to search for similar words for solutions in Delphi.

Upvotes: 9

Related Questions