Control Freak
Control Freak

Reputation: 13213

Using RegEx in SQL Server

I'm looking how to replace/encode text using RegEx based on RegEx settings/params below:

RegEx.IgnoreCase = True     
RegEx.Global = True     
RegEx.Pattern = "[^a-z\d\s.]+"   

I have seen some examples on RegEx, but confused as to how to apply it the same way in SQL Server. Any suggestions would be helpful. Thank you.

Upvotes: 133

Views: 656350

Answers (10)

Walter Verhoeven
Walter Verhoeven

Reputation: 4411

I am sure you can in the future as Microsoft will bring regex to SQL server but today you can try something like this CLR method:

public static class RegexFunctions
{
    private static readonly ConcurrentDictionary<string, Regex> _regexCache = new ConcurrentDictionary<string, Regex>(StringComparer.OrdinalIgnoreCase);
    /// <summary>
    /// Regexes the like.
    /// </summary>
    /// <param name="input">The input.</param>
    /// <param name="pattern">The pattern.</param>
    /// <returns>SqlBoolean.</returns>
    [SqlFunction(IsDeterministic = true, IsPrecise = true)]
    public static SqlBoolean REGEX_LIKE(SqlString input, SqlString pattern)
    {
        if (input.IsNull || pattern.IsNull)
            return SqlBoolean.False;

        var regex = _regexCache.GetOrAdd(
                        pattern.Value,
                        pat => new Regex(pat, RegexOptions.IgnoreCase | RegexOptions.Compiled)
                    );

        return regex.IsMatch(input.Value);
    }
}

register it with something like this:

CREATE FUNCTION dbo.REGEX_LIKE(@input NVARCHAR(MAX), @pattern NVARCHAR(MAX))
RETURNS BIT
AS EXTERNAL NAME IDPS_DMZ_Native_Sql.RegexFunctions.REGEX_LIKE;

just remember you need to have CLR enabled:

sp_configure 'clr enabled', 1;
RECONFIGURE;

then you can use it like so:

SELECT [UrlPath]
FROM [YourTable]
WHERE dbo.REGEX_LIKE([UrlPath], '(?i)(backup.*(?:\.7z|\.zip|\.tar\.gz)).*(https?://[^\s]+)') = 1;

to find URLs looking for compressed files in paths that contain the term backup. This example assumes you want to identify any URL patterns in a directory path containing backup (case-insensitive) and ending with .zip, .7z, or .tar.gz.

This provides one rule and a "and" condition, not something you can solve with a like statement

Upvotes: 0

Gabe
Gabe

Reputation: 6045

Regular Expression support is coming to Azure SQL Database

  • REGEXP_LIKE: This function returns TRUE if a string matches a regular expression patjtern, or FALSE otherwise.
  • REGEXP_COUNT: This function returns the number of times a regular expression pattern matches in a string.
  • REGEXP_INSTR: This function returns the starting or ending position, based on the specified option, of the given occurrence of a regular expression pattern in a string.
  • REGEXP_REPLACE: This function returns a modified string replaced by a ‘replacement string’, where occurrence of the regular expression pattern found.
  • REGEXP_SUBSTR: This function returns a substring that matches a regular expression pattern from a string.

https://devblogs.microsoft.com/azure-sql/introducing-regular-expression-regex-support-in-azure-sql-db/

Upvotes: 1

Dai
Dai

Reputation: 155035

Update for 2024

As of April 2024, Azure SQL Database has now (finally) added support for "real" Regular-Expressions using the POSIX dialect (as opposed to the PCRE or .NET Regex dialects, oddly); it is currently in Preview, so enabling this feature requires you to opt-in via a sign-up form and pray to Satya that your request is approved).

On-prem Microsoft SQL Server will gain the same regex feature in the next major release ("vNext" as they say; so, that'll be SQL Server 2025 presumably).

The new set of functions is somewhat documented in this community posting, but no reference for these functions yet exists in the main official documentation.


As for solving the OP's problem from 2012 when they wanted to replace text matching [^a-z\d\s.]+, then here's how you'd do it:

(The pattern \[^a-z\d\s.\]+ looks like it should match any run of characters that are not lowercase Latin letters, digits, whitespace or dots . - the . does not need to be escaped with a backslash when used in a [^…] not-in-range specifier).

So given example input "abcd123 ABCD 789 !£$%" then the abcd123 and 789 parts won't match the regex, but "ABCD" and "!£$%" does.

Usage with T-SQL variables:

To simply remove matching substrings, use REGEXP_REPLACE( input, pattern, replacement ) where the replacement argument is an empty-string:

(The N''-style of string-literal denotes an nvarchar literal, which avoids potential implicit conversion issues when working with nvarchar data).

DECLARE @input    nvarchar(255) = N'abcd123 ABCD 789 !£$%';
DECLARE @replaced nvarchar(255) = REGEXP_REPLACE( @input, N'[^a-z\d\s.]+', N'' );
SELECT @replaced AS "Output";

Outputs:

Output
"abcd123 789 "

Usage with SQL UPDATE DML:

While a REGEXP_REPLACE won't replace anything if the input text doesn't match, it still makes sense to add a WHERE clause with REGEXP_LIKE because "non-updating UPDATEs" still count as updates and cause writes for those rows that wouldn't otherwise be updated.

UPDATE
    yourTable
SET
    yourColumn = REGEXP_REPLACE( yourColumn, N'[^a-z\d\s.]+', N'' )
WHERE
    REGEXP_LIKE( yourColumn, N'[^a-z\d\s.]+' );

Upvotes: 3

clamum
clamum

Reputation: 1364

I thought I needed RegEx in order to perform my query (to find content that has an <img> tag but does not have an alt attribute in the tag) but that ended up not being the case (I had found a regex for the above criteria but it wasn't working for me in SQL).

In my case the following query was perfect. You may also have a situation where you don't actually need RegEx for your situation. The below query works because of sequential processing of the WHERE clause:

select *
from Content c
where c.Deleted = 0
    and c.content_html like '%<img%'
    and c.content_html not like '%alt=%'

Upvotes: 0

Rubens Farias
Rubens Farias

Reputation: 57936

You do not need to interact with managed code, as you can use LIKE:

CREATE TABLE #Sample(Field varchar(50), Result varchar(50))
GO
INSERT INTO #Sample (Field, Result) VALUES ('ABC123 ', 'Do not match')
INSERT INTO #Sample (Field, Result) VALUES ('ABC123.', 'Do not match')
INSERT INTO #Sample (Field, Result) VALUES ('ABC123&', 'Match')
SELECT * FROM #Sample WHERE Field LIKE '%[^a-z0-9 .]%'
GO
DROP TABLE #Sample

As your expression ends with + you can go with '%[^a-z0-9 .][^a-z0-9 .]%'

EDIT:
To make it clear: SQL Server doesn't support regular expressions without managed code. Depending on the situation, the LIKE operator can be an option, but it lacks the flexibility that regular expressions provides.

Upvotes: 142

Fiach Reid
Fiach Reid

Reputation: 7059

A similar approach to @mwigdahl's answer, you can also implement a .NET CLR in C#, with code such as;

using System.Data.SqlTypes;
using RX = System.Text.RegularExpressions;

public partial class UserDefinedFunctions
{
 [Microsoft.SqlServer.Server.SqlFunction]
 public static SqlString Regex(string input, string regex)
 {
  var match = RX.Regex.Match(input, regex).Groups[1].Value;
  return new SqlString (match);
 }
}

Installation instructions can be found here

Upvotes: 2

Ravi Makwana
Ravi Makwana

Reputation: 2918

Regular Expressions In SQL Server Databases Implementation Use

Regular Expression - Description
. Match any one character
* Match any character
+ Match at least one instance of the expression before
^ Start at beginning of line
$ Search at end of line
< Match only if word starts at this point
> Match only if word stops at this point
\n Match a line break
[] Match any character within the brackets
[^...] Matches any character not listed after the ^
[ABQ]% The string must begin with either the letters A, B, or Q and can be of any length
[AB][CD]% The string must have a length of two or more and which must begin with A or B and have C or D as the second character
[A-Z]% The string can be of any length and must begin with any letter from A to Z
[A-Z0-9]% The string can be of any length and must start with any letter from A to Z or numeral from 0 to 9
[^A-C]% The string can be of any length but cannot begin with the letters A to C
%[A-Z] The string can be of any length and must end with any of the letters from A to Z
%[%$#@]% The string can be of any length and must contain at least one of the special characters enclosed in the bracket

Upvotes: 10

mwigdahl
mwigdahl

Reputation: 16578

You will have to build a CLR procedure that provides regex functionality, as this article illustrates.

Their example function uses VB.NET:

Imports System
Imports System.Data.Sql
Imports Microsoft.SqlServer.Server
Imports System.Data.SqlTypes
Imports System.Runtime.InteropServices
Imports System.Text.RegularExpressions
Imports System.Collections 'the IEnumerable interface is here  


Namespace SimpleTalk.Phil.Factor
    Public Class RegularExpressionFunctions
        'RegExIsMatch function
        <SqlFunction(IsDeterministic:=True, IsPrecise:=True)> _
        Public Shared Function RegExIsMatch( _
                                            ByVal pattern As SqlString, _
                                            ByVal input As SqlString, _
                                            ByVal Options As SqlInt32) As SqlBoolean
            If (input.IsNull OrElse pattern.IsNull) Then
                Return SqlBoolean.False
            End If
            Dim RegExOption As New System.Text.RegularExpressions.RegExOptions
            RegExOption = Options
            Return RegEx.IsMatch(input.Value, pattern.Value, RegExOption)
        End Function
    End Class      ' 
End Namespace

...and is installed in SQL Server using the following SQL (replacing '%'-delimted variables by their actual equivalents:

sp_configure 'clr enabled', 1
RECONFIGURE WITH OVERRIDE

IF EXISTS ( SELECT   1
            FROM     sys.objects
            WHERE    object_id = OBJECT_ID(N'dbo.RegExIsMatch') ) 
   DROP FUNCTION dbo.RegExIsMatch
go

IF EXISTS ( SELECT   1
            FROM     sys.assemblies asms
            WHERE    asms.name = N'RegExFunction ' ) 
   DROP ASSEMBLY [RegExFunction]

CREATE ASSEMBLY RegExFunction 
           FROM '%FILE%'
GO

CREATE FUNCTION RegExIsMatch
   (
    @Pattern NVARCHAR(4000),
    @Input NVARCHAR(MAX),
    @Options int
   )
RETURNS BIT
AS EXTERNAL NAME 
   RegExFunction.[SimpleTalk.Phil.Factor.RegularExpressionFunctions].RegExIsMatch
GO

--a few tests
---Is this card a valid credit card?
SELECT dbo.RegExIsMatch ('^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})$','4241825283987487',1)
--is there a number in this string
SELECT dbo.RegExIsMatch( '\d','there is 1 thing I hate',1)
--Verifies number Returns 1
DECLARE @pattern VARCHAR(255)
SELECT @pattern ='[a-zA-Z0-9]\d{2}[a-zA-Z0-9](-\d{3}){2}[A-Za-z0-9]'
SELECT  dbo.RegExIsMatch (@pattern, '1298-673-4192',1),
        dbo.RegExIsMatch (@pattern,'A08Z-931-468A',1),
        dbo.RegExIsMatch (@pattern,'[A90-123-129X',1),
        dbo.RegExIsMatch (@pattern,'12345-KKA-1230',1),
        dbo.RegExIsMatch (@pattern,'0919-2893-1256',1)

Upvotes: 14

Zachary Scott
Zachary Scott

Reputation: 21162

Slightly modified version of Julio's answer.

-- MS SQL using VBScript Regex
-- select dbo.RegexReplace('aa bb cc','($1) ($2) ($3)','([^\s]*)\s*([^\s]*)\s*([^\s]*)')
-- $$ dollar sign, $1 - $9 back references, $& whole match

CREATE FUNCTION [dbo].[RegexReplace]
(   -- these match exactly the parameters of RegExp
    @searchstring varchar(4000),
    @replacestring varchar(4000),
    @pattern varchar(4000)
)
RETURNS varchar(4000)
AS
BEGIN
    declare @objRegexExp int, 
        @objErrorObj int,
        @strErrorMessage varchar(255),
        @res int,
        @result varchar(4000)

    if( @searchstring is null or len(ltrim(rtrim(@searchstring))) = 0) return null
    set @result=''
    exec @res=sp_OACreate 'VBScript.RegExp', @objRegexExp out
    if( @res <> 0) return '..VBScript did not initialize'
    exec @res=sp_OASetProperty @objRegexExp, 'Pattern', @pattern
    if( @res <> 0) return '..Pattern property set failed'
    exec @res=sp_OASetProperty @objRegexExp, 'IgnoreCase', 0
    if( @res <> 0) return '..IgnoreCase option failed'
    exec @res=sp_OAMethod @objRegexExp, 'Replace', @result OUT,
         @searchstring, @replacestring
    if( @res <> 0) return '..Bad search string'
    exec @res=sp_OADestroy @objRegexExp
    return @result
END

You'll need Ole Automation Procedures turned on in SQL:

exec sp_configure 'show advanced options',1; 
go
reconfigure; 
go
sp_configure 'Ole Automation Procedures', 1; 
go
reconfigure; 
go
sp_configure 'show advanced options',0; 
go
reconfigure;
go

Upvotes: 12

Kalyan Vasanth
Kalyan Vasanth

Reputation: 81

SELECT * from SOME_TABLE where NAME like '%[^A-Z]%'

Or some other expression instead of A-Z

Upvotes: 6

Related Questions