Reputation: 4919
I have a database containing about 50 tables, each table has about 10-100 columns with max 1 milion rows in each table. (quite big like for a newbie :P)
Database is old and some rows contains special characters (invisible characters or some weird unicode) and I would like to remove those characters.
I was searching google and I found a small snippet that lists all columns with specific type:
SELECT
OBJECT_NAME(col.OBJECT_ID) AS [TableName]
,col.[name] AS [ColName]
,typ.[name] AS [TypeName]
FROM
sys.all_columns col
INNER JOIN sys.types typ
ON col.user_type_id = typ.user_type_id
WHERE
col.user_type_id IN (167,231)
AND
OBJECT_NAME(col.OBJECT_ID) = 'Orders'
This lists all columns that are varchar or nvarchar.
I found two functions, one that returns a table of all characters from a string and second that checks if string contains any special characters:
CREATE FUNCTION AllCharactersInString (@str nvarchar(max))
RETURNS TABLE
AS
RETURN
(SELECT
substring(B.main_string,C.int_seq,1) AS character
,Unicode(substring(B.main_string,C.int_seq,1)) AS unicode_value
FROM
(SELECT
@str AS main_string) B,(SELECT
A.int_seq
FROM
(SELECT
row_number() OVER (ORDER BY name) AS int_seq
FROM
sys.all_objects) A
WHERE
A.int_seq <= len(@str)) C
)
And second:
CREATE FUNCTION ContainsInvisibleCharacter (@str nvarchar(max))
RETURNS int
AS
BEGIN
DECLARE @Result Int
IF exists
(SELECT
*
FROM
AllCharactersInString(@str)
WHERE
unicode_value IN (1,9,10,11,12,13,14,28,29,31,129,141,143,144,157,160))
BEGIN SET @Result = 1
END
ELSE
BEGIN SET @Result = 0
END
RETURN @Result
END
My question is how to combine thos two functions into one (if it is possible and if it will be faster) and second: how to run that function on all records in all columns (that are specific type) in a table.
I have this code:
SELECT
O.Order_Id
,Rn_Descriptor
FROM
dbo.Order O
WHERE
dbo.ContainsInvisibleCharacter(O.Rn_Descriptor) = 1
AND
O.Order_Id IN (SELECT TOP 1000
Order.Order_Id
FROM
dbo.Order
WHERE
Order.Rn_Descriptor IS NOT NULL
)
But it works sooo slow :/ Mayby there is a fastest way to remove unwanted characters? What will be fine is to find rows containing those characters, list them, then I could manually check them.
Upvotes: 0
Views: 2407
Reputation: 17957
You can do this more efficiently using LIKE
.
CREATE FUNCTION ContainsInvisibleCharacter(@str nvarchar(max)) RETURNS int
AS
BEGIN
RETURN
(SELECT CASE WHEN @str LIKE
'%[' + NCHAR(1) + NCHAR(9) + NCHAR(10) + NCHAR(11) + NCHAR(12)
+ NCHAR(13) + NCHAR(14) + NCHAR(28) + NCHAR(29) + NCHAR(31)
+ NCHAR(129) + NCHAR(141) + NCHAR(143) + NCHAR(144)
+ NCHAR(157) + NCHAR(160) + ']%'
THEN 1 ELSE 0 END)
END
Upvotes: 1