Reputation: 2143
Is there a way to increase the fault tolerance of the SQL SOUNDEX function when used for fuzzy string matching? I am using this to search for supplier names. What I have is:
WHERE
SOUNDEX(@SearchTerm) = SOUNDEX(s.Name)
This works to some degree, but I want to increase the fault tolerance a bit. For example:
SOUNDEX('test') = T230
SOUNDEX('tet') = T300
SOUNDEX('tets') = T320
SOUNDEX('tes') = T200
Ideally, I would like to have all these examples match. Also, a slight issue is when a supplier name consists of 2 words and the user searches by the second word, this will obviously work since the marching is dependent on the first letter of the word:
SOUNDEX('test supplier') = T230
SOUNDEX('supplier') = S146
Originally, I was making use of a fuzzy string comparison library in the web application that used various approximate string comparison algorithms, but it turns out when 20 people search concurrently for 5000+ suppliers, it's too much for the web server to handle. I'm now attempting to do this in a stored procedure so that the database returns only the searched results and not the complete list of suppliers to be searched.
Upvotes: 0
Views: 682
Reputation: 39015
You can implement an UDF and deploy it in the server using SQL CLR integration. (More spceifically: CLR Scalar-Valued Functions)
I can assure this works because I'm doing exactly that same thing and looking over more than 350.000 rows (without any other filter) and it works really fast. It's a variation of Damerau-Leventshein algorithm, thus a time consuming algorithm.
The difference between returning all the rows to the app, and filtering it in the app side, and doing the filtering directly inside SQL Server is huge.
Upvotes: 1