Nick
Nick

Reputation: 4766

SQL Server : create regex search

I am trying to use a regular expression to do an accent insensitive search on an accent sensitive column.

  1. I cannot change the column to an accent insensitive collation because the column is unique and we need to allow both 'Jose' and 'José' to be entered.

  2. I cannot coerce the collation within my query because the table holds a ton of data and coercing it will cause it to do an index scan instead of a seek and the query times out

  3. I cannot add a new non-unique column that uses an accent insensitive collation. This would avoid the 2 issues above because the original column could be accent sensitive and unique and I would be able to do the search on the new column and use the index on it, however I'm being told I can't do this because it would cause data duplication.

So, I'm trying to do this using a regular expression search on the accent sensitive column. In my application I take in whatever string the user entered and I modify the string in such a way that for this input

"Jose"

I get the output

"J[òóôõöø][sš][eèéêë]%"

Using this string to search works. It finds "Jose" and "José" and it searches with the correct index and doesn't time out.

The only issue I have now is with the darn æ character.

If the database contains the values "aéro" & "æro" and the user enters "aero", currently my application will generate the search strings "[aàáâãäå][eèéêë]r[oòóôõöø]%" But this will only be a match for "aéro". It will not match "æro". By default in sql server it already treats "ae" the same as "æ" whether the collation is accent sensitive or insensitive, so not modifying the user's input would return "æro" as a match, but would not return "aéro" as a match.

Does anyone have any ideas how I can do a regular expression search in SQL that will match both "aé" and "æ"?

SELECT *
FROM mytable
WHERE name LIKE '[aàáâãäå][eèéêë]%'

Upvotes: 2

Views: 151

Answers (1)

Edmond Quinton
Edmond Quinton

Reputation: 1739

Unfortunately, SQL Server LIKE statement does not support the full regex functionality. You can however achieve the result you are looking for by modifying your current query as follow:

DECLARE @primarySearch VARCHAR(100) = '[aàáâãäå][eèéêë]r[oòóôõöø]%';
DECLARE @secondarySearchKey VARCHAR(100) = REPLACE(@primarySearch, '[aàáâãäå][eèéêë]', 'æ')

SELECT *
FROM    mytable
WHERE   Name LIKE @primarySearch OR Name LIKE @secondarySearchKey

In the modified version we add a second OR comparison check that simply replaced [aàáâãäå][eèéêë] combination with the æ character. This will add some additional overhead to your query but I doubt it will be significant.

UPDATE:

The above solution will not work for multiple occurrence [aàáâãäå][eèéêë] combination as per Nick’s comments. You could however perform an initial wider search that only matches the pattern up to the first occurrence of the [aàáâãäå][eèéêë] character combination. Once the initial match is perform you can normalize the text by replacing the occurrence of the ‘æ’ character with ‘ae’ string. Once this is done you can perform the refined search as per following example:

DECLARE @primarySearch VARCHAR(100) = '[aàáâãäå][eèéêë]_[aàáâãäå][eèéêë]_[aàáâãäå][eèéêë]%';

IF (@primarySearch LIKE '%\[aàáâãäå\]\[eèéêë\]%' ESCAPE '\') OR  @primarySearch LIKE '%æ%' 
BEGIN

    DECLARE @firstOccurrence INT = CHARINDEX('[aàáâãäå][eèéêë]', @primarySearch);
    DECLARE @initialSearchKey VARCHAR(100) = LEFT(@primarySearch, @firstOccurrence + LEN('[aàáâãäå][eèéêë]') - 1) + '%'
    DECLARE @initialSearchSpecialCaseKey VARCHAR(100) = REPLACE(@initialSearchKey, '[aàáâãäå][eèéêë]', 'æ');

    SELECT @initialSearchKey AS IntialKey;
    SELECT @initialSearchSpecialCaseKey AS SpecialCaseKey ;

    WITH InitialSearch AS  
    (
        SELECT  Name
                ,REPLACE(Name, 'æ', 'ae') AS NormalizedName
        FROM    mytable
        WHERE   Name LIKE @initialSearchKey OR Name LIKE @initialSearchSpecialCaseKey
    )
    SELECT  Name
    FROM    InitialSearch
    WHERE   NormalizedName LIKE @primarySearch

END
ELSE
BEGIN

    SELECT  Name
    FROM    mytable
    WHERE   Name LIKE @primarySearch
END

Upvotes: 2

Related Questions