Reputation: 1432
I have a table called airports
in a SQL Server database, with a column declared as nvarchar(255)
. I had to declare it as nvarchar
otherwise SSIS failed to import the data from a .csv
file generated by an API.
I have approx 25k records in this table, where by from what I can tell 763 have Unicode characters in them, by running this query:
select cast(name as varchar), name
from airports
where cast(name as varchar) <> name
The first row shows the following two values returned in column 1 and 2
Harrisburg Capital City Airpor
Harrisburg Capital City Airport
The first value from column 1 has had the last t stripped off it, which I assume means there is one unicode character in the string. Please let me know if I am wrong, as I am a bit useless with unicode characters.
My question is: how can I find the unicode characters in the column, and is there a safe / recommended way to remove them?
I did try this to see if I could find it, but it didn't do what I thought it would do.
set nocount on
DECLARE @nstring NVARCHAR(100)
SET @nstring =(select name from airports where fs = 'HAR')
DECLARE @position INT
SET @position = 1
DECLARE @CharList TABLE (Position INT,UnicodeChar NVARCHAR(1),UnicodeValue INT)
WHILE @position <= DATALENGTH(@nstring)
BEGIN
INSERT @CharList
SELECT
@position as Position,
CONVERT(nchar(1),SUBSTRING(@nstring, @position, 1)) as UnicodeChar,
UNICODE(SUBSTRING(@nstring, @position, 1)) as UnicodeValue
SET @position = @position + 1
END
SELECT *
FROM @CharList[/sql]
ORDER BY unicodevalue
The output is as follows
32 NULL
33 NULL
34 NULL
35 NULL
36 NULL
37 NULL
38 NULL
39 NULL
40 NULL
41 NULL
42 NULL
43 NULL
44 NULL
45 NULL
46 NULL
47 NULL
48 NULL
49 NULL
50 NULL
51 NULL
52 NULL
53 NULL
54 NULL
55 NULL
56 NULL
57 NULL
58 NULL
59 NULL
60 NULL
61 NULL
62 NULL
11 32
19 32
24 32
25 A 65
20 C 67
12 C 67
1 H 72
2 a 97
13 a 97
17 a 97
7 b 98
10 g 103
15 i 105
5 i 105
21 i 105
26 i 105
18 l 108
29 o 111
28 p 112
14 p 112
9 r 114
3 r 114
4 r 114
30 r 114
27 r 114
6 s 115
16 t 116
22 t 116
31 t 116
8 u 117
23 y 121
Upvotes: 1
Views: 5306
Reputation: 50163
However, if you want to first find the records which have some unicode
chars then follow below approach with help of case
expression
;WITH CTE
AS (
SELECT DATA,
CASE
WHEN(CAST(DATA AS VARCHAR(MAX)) COLLATE SQL_Latin1_General_Cp1251_CS_AS) = DATA
THEN 0
ELSE 1
END HasUnicodeChars,
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) RN
FROM <table_name>)
SELECT * FROM CTE where HasUnicodeChars = 1
Upvotes: 1