PaulOTron2000
PaulOTron2000

Reputation: 1225

SQL Server Express: something better than LIKE for queries of SIMILAR names

I need a special operator that's maybe a bit better than LIKE to query for "similar" values.

THE SCENARIO:
I have a table of students, and I have a table of lessons. The table of lessons was imported from other software, so the StudentID column is null on the imported rows. So we need the user to manually select the appropriate student row for each lesson, and then the StudentID column can be populated so they're properly synced. Both tables contain first and last names, but a lot of them are very likely to be misspelled.

After importing the rows, I would like to present the user with the names from the student rows where the names are "top five most similar" to the values stored in each lesson row. In fact I'd like to present them in descending order from most-to-least similar.

A query containing the LIKE operator doesn't quite cut it because it requires specific text must exist within the column, and it doesn't return a "similarity score".

It is my understanding (from non-technical articles) that the US Post Office has this issue very well handled... People misspell names and street names all the time, but their algorithm for "find similar" is very effective.

I know the algorithm could vary from one solution to the next. For example I read from the article that some algorithms consider phonetics, others consider the count of vowels and consonants, while others consider that "T" sounds like "P" when spoken over the phone.

I COULD load every record into my app code and write my own algorithm in c#, VB.NET or whatever, but there are lots of problems with that including performance. I'd rather accomplish this within the query so I'm looking for alternatives.

I'm using SQL Server Express but I'm sure the solution applies to other database platforms.

Upvotes: 0

Views: 398

Answers (2)

Mikey G
Mikey G

Reputation: 3491

SQL FreeText might work for you:

http://msdn.microsoft.com/en-us/library/ms176078.aspx

It searches against a Thesaurus, although i'm not sure how well it does with names. Its very easy to implement, however.

Upvotes: 1

Mithrandir
Mithrandir

Reputation: 25377

SQL Server supports the SOUNDEX() function, but this works only for similar sounding names and that not to well, at least if you handle non english texts. You could write you own function in c# or vb.net, facilitating any algorithm that might apply to your needs, and import it as a scalar function into sql server.

Upvotes: 1

Related Questions