dukevin
dukevin

Reputation: 23178

Partial matching

Is there a built in function or a function that someone has already written that can match names without being exact?

For example, I have:

Marry
John
Steve
Steven
Stewie

If someone types "stew" the function would return Stewie.
Or if someone types "ry" the function would return Marry.
Or if someone misspells "Marries" the function would still return Marry. (due to being the most similar of them all)
If "Ste" is supplied it can return false but it doesn't really matter to me.

Does anyone know how to write this sort of function or know of one already written? Seeing as this is probably a common thing, I would assume so.

Thanks.

Upvotes: 2

Views: 296

Answers (2)

fyr
fyr

Reputation: 20859

Actually there are some methods to achieve this:

Built-In methods

Not Built in methods

  • LCS Longest common subsequence
  • Letter N-Grams (used sometimes for spellchecking)
  • Levensthein automaton
  • Word lists (just for completeness)

One of those should help you to solve your problem.

The problem of every of those algorithms is that they are not accurate. So you will have a heuristical solution to the problem.

Usually there are pro and cons between distance and sound algorithms. Sound specific algorithms are less accurate(round about 33% accuracy). But fast. Levensthein is much more accurate but slow. At least the php implementation. There are other systems where Levensthein is faster by a large margin (see e.g. Levensthein Automata. But this automata algorithm is not built in in php).

Probably as a basic hint:

  • If you have much unique terms to compare against dont use Similar_text or Levensthein stick with Sound algorithms
  • If you have a pretty small set use Levensthein.

Upvotes: 2

oezi
oezi

Reputation: 51797

sounds like soundex() or metaphone() is what you're looking for. using those, you can calculate a "key" that represents how a word sounds - doing this for all strings you can compare if two words sound the same (optimized for english language).

another possibility would be levenshtein() wich directly calculates the difference between two strings, so you can compare all strings and show the 5 best hits or something like that.

Upvotes: 1

Related Questions