Max
Max

Reputation: 1299

Regex with SQL Server 2008 CLR performance issues

I am trying to understand why is it taking so long to execute a simple query. In my local machine it takes 10 seconds but in production it takes 1 min. (I imported the database from production into my local database)

select * 
from JobHistory
where dbo.LikeInList(InstanceID, 'E218553D-AAD1-47A8-931C-87B52E98A494') = 1

The table DataHistory is not indexed and it has 217,302 rows

public partial class UserDefinedFunctions
{
    [SqlFunction]
    public static bool LikeInList([SqlFacet(MaxSize = -1)]SqlString value, [SqlFacet(MaxSize = -1)]SqlString list)
    {
        foreach (string val in list.Value.Split(new char[] { ',' }, StringSplitOptions.None))
        {
            Regex re = new Regex("^.*" + val.Trim() + ".*$", RegexOptions.IgnoreCase);

            if (re.IsMatch(value.Value))
            {
                return(true);
            }
        }

        return (false);
    }
};

And the issue is that if a table has 217k rows then I will be calling that function 217,000 times! not sure how I can rewrite this thing.

Thank you

Upvotes: 1

Views: 324

Answers (1)

Solomon Rutzky
Solomon Rutzky

Reputation: 48864

There are several issues with this code:

  1. Missing (IsDeterministic = true, IsPrecise = true) in [SqlFunction] attribute. Doing this (mainly just the IsDeterministic = true part) will allow the SQLCLR UDF to participate in parallel execution plans. Without setting IsDeterministic = true, this function will prevent parallel plans, just like T-SQL UDFs do.
  2. Return type is bool instead of SqlBoolean
  3. RegEx call is inefficient: using an instance method once is expensive. Switch to using the static Regex.IsMatch instead
  4. RegEx pattern is very inefficient: wrapping the search string in "^.*" and ".*$" will require the RegEx engine to parse and retain in memory as the "match", the entire contents of the value input parameter, for every single iteration of the foreach. Yet the behavior of Regular Expressions is such that simply using val.Trim() as the entire pattern would yield the exact same result.

  5. (optional) If neither input parameter will ever be over 4000 characters, then specify a MaxSize of 4000 instead of -1 since NVARCHAR(4000) is much faster than NVARCHAR(MAX) for passing data into, and out of, SQLCLR objects.

Upvotes: 1

Related Questions