Com Piler
Com Piler

Reputation: 267

Making regex search more efficient in SQL Server

I'm trying to create a scrabble-playing bot. So I thrown all (2,7 mln) polish words to a SQL Server database and now working on creating patterns for regex queries. I have just found out that adding some conditions to WHERE clause could make search much more efficient.

As an example, executing query:

SELECT * FROM words WHERE dbo.[like](word, '^[def]{1,3}$') = 1;

lasts about 43 sec, but adding quite obvious length condition:

SELECT * 
FROM words 
WHERE dbo.[like](word, '^[def]{1,3}$') = 1 AND LEN(word) <= 3;

reduces execution time to 3 sec... could you tell me why, and advise some techniques that would help me make queries more efficient?

P.S. like function is CLR written in c#:

public static bool Like(string text, string pattern)
{
    Match match = Regex.Match(text, pattern);

    return (match.Value != String.Empty);
}

Upvotes: 2

Views: 3891

Answers (2)

Com Piler
Com Piler

Reputation: 267

Basing on @Laurence comment and @user3973227 answer I made the best solution so far. I tried putting my dicitionary into memory, and then filtered it with regex like that:

var command = new SqlCommand("SELECT word FROM words", con);
SqlDataReader reader = command.ExecuteReader();
Regex regex = new Regex("^[def]{1,3}$");
while (reader.Read())
{
   if(regex.IsMatch(reader.GetString(0)))
   GameOperations.Log(reader.GetString(0));
}

it did the job in... 2,6 seconds... when I added length condition to my query:

var command = new SqlCommand("SELECT word FROM words WHERE LEN(word) <= 3", con);

I got results in 0,09 sec!

It proves that this way of filtering table with regex is better than any CLR. Adding more conditions to query also speeds up the process, especially when they are sargable (capable of using indexes). Using Regex constructor is important too, in my case reduced the time of filtering twice.

Upvotes: 0

T J
T J

Reputation: 110

using clr and regex in sql is slow and there is not much you can do about it. my advice is to limit the amount of regex you need to run and try to reduce the amount of data you have to run the regex on, for example like you did in your second query.

the "show query plan" window is your friend for optimizing sql queries.

the built in sql LIKE operator is going to be much quicker but of course is also much more limited. the built in LIKE operator also have the added bonus of sometimes being able to use indexes.

you could possibly rewrite your query using sql LIKE into something like this:

SELECT *
FROM words
WHERE
 word LIKE '[def][def][def]'
 OR word LIKE '[def][def]'
 OR word LIKE '[def]'

but it is not pretty

you may also find this one interesting: What makes a SQL statement sargable?

Upvotes: 1

Related Questions