Umut Derbentoğlu
Umut Derbentoğlu

Reputation: 1196

Best Practice For Levenshtein Distance on SQL Server

I have a web and a mobile dictionary application that uses SQL Server. I am trying to implement a simple version of "did you mean" feature. If the phrase that user entered is not exists in the db, I need make a suggestions.

I am planning to use the levenshtein distance algorithm. But there is a point that I couldn't figure out: do I need to calculate the levenshtein distance between user entry and all the words that exists in my db one by one?

Let's assume that I have one million word in my database. When user enters an incorrect word, will I calculate distance a million time?

Obviously that would need a great deal of time. What is the best practice for this situation?

Upvotes: 3

Views: 4162

Answers (2)

Jason A. Long
Jason A. Long

Reputation: 4442

In terms of implementation, I'd set it up so that the word list gets cached to the web server and do the comparisons there. You don't want to execute a database stored procedure every time a user makes a keystroke. For performance reasons, you'll want to make the back & forth as shot and simple as possible. Besides, procedural languages are better at making these types of calculations than declarative languages anyway. If possible you may create a small indexed cache on the client machine so that the final stages can be completed w/o making any web calls.

In terms of making the actual matches, look up Lawrence Philips' Double Metaphone algorithm. It's not as good a Google's "did you mean?" but it's much better than SOUNDEX... And it's been translated into multiple coding languages. By using double metaphone in conjunction with Levenshtein distance you should be able to made some good matches.

Upvotes: 0

Frederik Gheysels
Frederik Gheysels

Reputation: 56934

Have you already looked at the SOUNDEX user defined function that is available in SQL Server ?

You could use a trigger which calculates the soundex of a column and saves it next to that column each time the column is updated. When searching, you can calculate the soundex of the search criterium and compare it with the stored soundex-column in the table.

Upvotes: 1

Related Questions