Kata Garamszegi
Kata Garamszegi

Reputation: 25

MSSQL Replace issue when searching for white spaces/illegal characters in nvarchar

I just met with a really strange issue in one of our customer database. We use a function, to replace all illegal characters (in hexa format) with empty strings before casting the final string to XML:

ALTER FUNCTION [META].[UDF_REMOVE_ILLEGAL_CHARACTERS] 
(
    @xml NVARCHAR(MAX)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    return REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
           REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
           REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
           REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
           REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( 
           REPLACE( REPLACE( REPLACE( REPLACE( 
           @xml
           ,char(0x0000),'') ,char(0x0001),'') ,char(0x0002),'') ,char(0x0003),'') ,char(0x0004),'') 
           ,char(0x0005),'') ,char(0x0006),'') ,char(0x0007),'') ,char(0x0008),'') ,char(0x000B),'') 
           ,char(0x000C),'') ,char(0x000E),'') ,char(0x000F),'') ,char(0x0010),'') ,char(0x0011),'') 
           ,char(0x0012),'') ,char(0x0013),'') ,char(0x0014),'') ,char(0x0015),'') ,char(0x0016),'') 
           ,char(0x0017),'') ,char(0x0018),'') ,char(0x0019),'') ,char(0x001A),'') ,char(0x001B),'') 
           ,char(0x001C),'') ,char(0x001D),'') ,char(0x001E),'') ,char(0x001F),'')

END

It works just fine in all of our databases except one. In that database the length of the string decreases drastically for some reason. I tested this issue around and found that if I try to replace any whitespaces or illegal characters in any format (simply empty string, char(0) as below, hexa as above, illegal characters copied) it just cuts down the input string and in case of casting to XML it results in an "unexpected end of input" error. We are talking about 2306306183 characters long nvarchar variables as input for our function (however not greater than 2gb so this is not the issue).

error

I also tried another method with a while cycle using stuff and patindex, and it works fine, but for such a huge amount of data it makes performance issues, so we can't release this for customer use (from 2 hours to 16 hours for the whole process):

DECLARE @IncorrectCharLoc INT, --Position of bad character
        @Pattern          CHAR(37)  --Bad characters to look for

SELECT @Pattern          = '%['
                          + CHAR(0)+CHAR(1)+CHAR(2)+CHAR(3)+CHAR(4)
                          + CHAR(5)+CHAR(6)+CHAR(7)+CHAR(8)+CHAR(9)
                          + CHAR(10)+CHAR(11)+CHAR(12)+CHAR(13)+CHAR(14)
                          + CHAR(15)+CHAR(16)+CHAR(17)+CHAR(18)+CHAR(19)
                          + CHAR(20)+CHAR(21)+CHAR(22)+CHAR(23)+CHAR(24)
                          + CHAR(25)+CHAR(26)+CHAR(27)+CHAR(28)+CHAR(29)
                          + CHAR(30)+CHAR(31)+CHAR(127)
                          + ']%',
       @IncorrectCharLoc = PATINDEX(@Pattern, @XMLInitiative)

WHILE @IncorrectCharLoc > 0
BEGIN
    SELECT @XMLInitiative          = STUFF(@XMLInitiative, @IncorrectCharLoc, 1, ''),
           @IncorrectCharLoc = PATINDEX(@Pattern, @XMLInitiative)
end

Is there any database setup that can cause issues with replace like this? I compared all collations on database and server level too with our working databases, and they are all the same, so this can't be the issue.

Any ideas? :)

Thank you!

Upvotes: 1

Views: 263

Answers (0)

Related Questions