Reputation: 2066
I have a table with around 4 million records and I need to remove any instance of the ASCII characters in the following range(s) in a VARCHAR(MAX)
column.
Please note the above represent the decimal identifiers for each character, the actual ASCII characters to be replaced can be crosschecked against http://www.asciitable.com/.
I have tried the following ways and achieved the desired results.
First way: using nested REPLACE
functions:
UPDATE [MyTable]
SET replace_ascii_chars = REPLACE(REPLACE(replace_ascii_chars, char(0), '')
,char(1), '')....
Second way: created the following SQL function and ran an UPDATE
statement:
CREATE FUNCTION [dbo].RemoveASCIICharactersInRange(@InputString VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
IF @InputString IS NOT NULL
BEGIN
DECLARE @Counter INT, @TestString NVARCHAR(40)
SET @TestString = '%[' + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR(5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31) + NCHAR(127)+ ']%'
SELECT @Counter = PATINDEX (@TestString, @InputString COLLATE Latin1_General_BIN)
WHILE @Counter <> 0
BEGIN
SELECT @InputString = STUFF(@InputString, @Counter, 1, '')
SELECT @Counter = PATINDEX (@TestString, @InputString COLLATE Latin1_General_BIN)
END
END
RETURN(@InputString)
END
UPDATE SQL:
UPDATE [MyTable]
SET replace_ascii_chars = [dbo].RemoveASCIICharactersInRange(replace_ascii_chars)
Both the aforementioned ways worked fine but seems to be a bit slow (taking around 1.5 hours). Is there way to speed it up in SQL Server? I do not want to do the replacements in the application layer as the values are already saved in this way.
Also, not all 4 million records do not have these characters, so is there a way to speed it up?
EDIT: I was wrong in saying WHERE clause taking longer (it was slower because of the way I used where clause earlier). I have updated the where clause as per @MWillemse suggestion and it seems to be very fast now.
Upvotes: 1
Views: 19636
Reputation: 2066
The query is now running faster after I have added a WHERE clause as per @MWillemse suggestion.
UPDATE [MyTable]
SET replace_ascii_chars = REPLACE(REPLACE(replace_ascii_chars, char(0), '')
,char(1), '')....
WHERE replace_ascii_chars LIKE '%[' + CHAR(0) + CHAR(1) + CHAR(2) + CHAR(3) + CHAR(4) + CHAR(5) + CHAR(6) + CHAR(7) + CHAR(8) + CHAR(11) + CHAR(12) + CHAR(14) + CHAR(15) + CHAR(16) + CHAR(17) + CHAR(18) + CHAR(19) + CHAR(20) + CHAR(21) + CHAR(22) + CHAR(23) + CHAR(24) + CHAR(25) + CHAR(26) + CHAR(27) + CHAR(28) + CHAR(29) + CHAR(30) + CHAR(31) + CHAR(127)+ ']%'
Thank you @MWillemse, @jarlh, @Balm for your valuable comments.
EDIT: Please note this would work only from SQL SERVER 2012
and above. Please follow this stackoverflow link for discussion on issues with LOB data types and certain collations when used with REPLACE
function in older SQL Server
versions.
Upvotes: 2