Misiu
Misiu

Reputation: 4919

Find special characters in all rows in specific columns in table

I have a database containing about 50 tables, each table has about 10-100 columns with max 1 milion rows in each table. (quite big like for a newbie :P)

Database is old and some rows contains special characters (invisible characters or some weird unicode) and I would like to remove those characters.

I was searching google and I found a small snippet that lists all columns with specific type:

SELECT
    OBJECT_NAME(col.OBJECT_ID) AS [TableName]
    ,col.[name] AS [ColName]
    ,typ.[name] AS [TypeName]
FROM
    sys.all_columns col
    INNER JOIN sys.types typ
    ON col.user_type_id = typ.user_type_id
WHERE
    col.user_type_id IN (167,231)
    AND
    OBJECT_NAME(col.OBJECT_ID) = 'Orders'

This lists all columns that are varchar or nvarchar.

I found two functions, one that returns a table of all characters from a string and second that checks if string contains any special characters:

CREATE FUNCTION AllCharactersInString (@str nvarchar(max))
RETURNS TABLE 
AS
RETURN
    (SELECT
      substring(B.main_string,C.int_seq,1) AS character
     ,Unicode(substring(B.main_string,C.int_seq,1)) AS unicode_value
     FROM
        (SELECT
        @str AS main_string) B,(SELECT
                                A.int_seq
                                FROM
                                  (SELECT
                                   row_number() OVER (ORDER BY name) AS int_seq
                                   FROM
                                   sys.all_objects) A
                                   WHERE
                                   A.int_seq <= len(@str)) C
    )

And second:

CREATE FUNCTION ContainsInvisibleCharacter  (@str nvarchar(max))
RETURNS int
AS
BEGIN
DECLARE @Result Int
IF exists
 (SELECT
  *
  FROM
     AllCharactersInString(@str)
  WHERE
  unicode_value IN (1,9,10,11,12,13,14,28,29,31,129,141,143,144,157,160))
BEGIN SET @Result = 1
   END
ELSE
BEGIN SET @Result = 0
   END
RETURN @Result
END

My question is how to combine thos two functions into one (if it is possible and if it will be faster) and second: how to run that function on all records in all columns (that are specific type) in a table.

I have this code:

SELECT
    O.Order_Id
    ,Rn_Descriptor
FROM
    dbo.Order O
WHERE
    dbo.ContainsInvisibleCharacter(O.Rn_Descriptor) = 1
    AND
    O.Order_Id IN (SELECT TOP 1000 
                          Order.Order_Id
                      FROM
                          dbo.Order
                      WHERE
                          Order.Rn_Descriptor IS NOT NULL
                      )

But it works sooo slow :/ Mayby there is a fastest way to remove unwanted characters? What will be fine is to find rows containing those characters, list them, then I could manually check them.

Upvotes: 0

Views: 2407

Answers (1)

Anthony Faull
Anthony Faull

Reputation: 17957

You can do this more efficiently using LIKE.

CREATE FUNCTION ContainsInvisibleCharacter(@str nvarchar(max)) RETURNS int
AS
BEGIN
    RETURN
        (SELECT CASE WHEN @str LIKE 
        '%[' + NCHAR(1) + NCHAR(9) + NCHAR(10) + NCHAR(11) + NCHAR(12)
             + NCHAR(13) + NCHAR(14) + NCHAR(28) + NCHAR(29) + NCHAR(31)
             + NCHAR(129) + NCHAR(141) + NCHAR(143) + NCHAR(144)
             + NCHAR(157) + NCHAR(160) + ']%'
         THEN 1 ELSE 0 END)
END

Upvotes: 1

Related Questions