rmon2852
rmon2852

Reputation: 220

SQL remove characters that aren't in a regex pattern

Suppose i have the following regex pattern of:

%[a-zA-Z0-9+&@#\/%=~_|$?!-:,.']%

How could I iterate through a strings characters and remove (or replace with a blank) any characters that don't match the pattern?

Upvotes: 0

Views: 11675

Answers (3)

ErikE
ErikE

Reputation: 50282

This:

PatIndex('%[^a-zA-Z0-9+&@#\/%=~_|$?!-:,.']%', YourValue)

will return the character at which the pattern matches. In this case, I've added ^ to the beginning so that the pattern matches everything not in the character set.

You can then remove the character at that position, and continue, or replace all occurrences of the found character in the entire string.

FYI: to simulate the offset parameter of CharIndex in order to search starting at a certain character position, you can use Substring to get a portion of the string (or even one character) and use PatIndex on that.

Upvotes: 3

Owain Williams
Owain Williams

Reputation: 2647

As ErikE mentions in his answer, the PATINDEX method is what you need, it is SLOW though.

An answer to a very similar question on SO is stackoverflow.com/a/23001916/4316831. The below code was taken from there:

DECLARE @counter int

SET @counter = 0

WHILE(@counter < (SELECT MAX(ID_COLUMN) FROM Table))
BEGIN  

    WHILE 1 = 1
    BEGIN
        DECLARE @RetVal varchar(50)

        SET @RetVal =  (SELECT Column = STUFF(Column, PATINDEX('%[^0-9.]%', Column),1, '')
        FROM Table
        WHERE ID_COLUMN = @counter)

        IF(@RetVal IS NOT NULL)       
          UPDATE Table SET
          Column = @RetVal
          WHERE ID_COLUMN = @counter
        ELSE
            break
    END

    SET @counter = @counter + 1
END

Upvotes: 0

rmon2852
rmon2852

Reputation: 220

Ok based on ErikE's help I made the following function

     CREATE FUNCTION dbo.RemoveInvalidCharacters
 (
@pattern as varchar(max),
@string as varchar(max)
)
RETURNS varchar(MAX)
AS
BEGIN

    WHILE PatIndex(@pattern,@string) >0
      BEGIN
      SET @string=REPLACE(@string,SUBSTRING(@string,PatIndex(@pattern,@string),1),'')
END
RETURN @String

END
GO   

Then all I do was call the function as:

  select dbo.RemoveInvalidCharacters('%[^a-zA-Z0-9+&@#\/%=~_|$?!:,.''-]%','Test€€€String^^^')

Then the output is: TestString

Upvotes: 3

Related Questions