Sathish
Sathish

Reputation: 2066

SQL Server: Replace a list of ASCII characters from a VARCHAR(MAX) field

I have a table with around 4 million records and I need to remove any instance of the ASCII characters in the following range(s) in a VARCHAR(MAX) column.

Please note the above represent the decimal identifiers for each character, the actual ASCII characters to be replaced can be crosschecked against http://www.asciitable.com/.

I have tried the following ways and achieved the desired results.

First way: using nested REPLACE functions:

UPDATE [MyTable] 
SET replace_ascii_chars = REPLACE(REPLACE(replace_ascii_chars, char(0), '') 
,char(1), '')....

Second way: created the following SQL function and ran an UPDATE statement:

CREATE FUNCTION [dbo].RemoveASCIICharactersInRange(@InputString VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
    IF @InputString IS NOT NULL
    BEGIN
      DECLARE @Counter INT, @TestString NVARCHAR(40)

      SET @TestString = '%[' + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR(5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31) + NCHAR(127)+ ']%'

      SELECT @Counter = PATINDEX (@TestString, @InputString COLLATE Latin1_General_BIN)

      WHILE @Counter <> 0
      BEGIN
        SELECT @InputString = STUFF(@InputString, @Counter, 1, '')
        SELECT @Counter = PATINDEX (@TestString, @InputString COLLATE Latin1_General_BIN)
      END
    END
    RETURN(@InputString)
END

UPDATE SQL:

UPDATE [MyTable] 
SET replace_ascii_chars = [dbo].RemoveASCIICharactersInRange(replace_ascii_chars)

Both the aforementioned ways worked fine but seems to be a bit slow (taking around 1.5 hours). Is there way to speed it up in SQL Server? I do not want to do the replacements in the application layer as the values are already saved in this way.

Also, not all 4 million records do not have these characters, so is there a way to speed it up?

EDIT: I was wrong in saying WHERE clause taking longer (it was slower because of the way I used where clause earlier). I have updated the where clause as per @MWillemse suggestion and it seems to be very fast now.

Upvotes: 1

Views: 19636

Answers (1)

Sathish
Sathish

Reputation: 2066

The query is now running faster after I have added a WHERE clause as per @MWillemse suggestion.

UPDATE [MyTable] 
SET replace_ascii_chars = REPLACE(REPLACE(replace_ascii_chars, char(0), '') 
,char(1), '')....
WHERE replace_ascii_chars LIKE '%[' + CHAR(0) + CHAR(1) + CHAR(2) + CHAR(3) + CHAR(4) + CHAR(5) + CHAR(6) + CHAR(7) + CHAR(8) + CHAR(11) + CHAR(12) + CHAR(14) + CHAR(15) + CHAR(16) + CHAR(17) + CHAR(18) + CHAR(19) + CHAR(20) + CHAR(21) + CHAR(22) + CHAR(23) + CHAR(24) + CHAR(25) + CHAR(26) + CHAR(27) + CHAR(28) + CHAR(29) + CHAR(30) + CHAR(31) + CHAR(127)+ ']%'

Thank you @MWillemse, @jarlh, @Balm for your valuable comments.

EDIT: Please note this would work only from SQL SERVER 2012 and above. Please follow this stackoverflow link for discussion on issues with LOB data types and certain collations when used with REPLACE function in older SQL Server versions.

Upvotes: 2

Related Questions