Ab Bennett
Ab Bennett

Reputation: 1432

Identifying unicode character in nvarchar column in SQL Server

I have a table called airports in a SQL Server database, with a column declared as nvarchar(255). I had to declare it as nvarchar otherwise SSIS failed to import the data from a .csv file generated by an API.

I have approx 25k records in this table, where by from what I can tell 763 have Unicode characters in them, by running this query:

select cast(name as varchar), name 
from airports 
where cast(name as varchar) <> name

The first row shows the following two values returned in column 1 and 2

Harrisburg Capital City Airpor  
Harrisburg Capital City Airport

The first value from column 1 has had the last t stripped off it, which I assume means there is one unicode character in the string. Please let me know if I am wrong, as I am a bit useless with unicode characters.

My question is: how can I find the unicode characters in the column, and is there a safe / recommended way to remove them?

I did try this to see if I could find it, but it didn't do what I thought it would do.

set nocount on
DECLARE @nstring NVARCHAR(100)
SET @nstring =(select name from airports where fs = 'HAR')
DECLARE @position INT
SET @position = 1

DECLARE @CharList TABLE (Position INT,UnicodeChar NVARCHAR(1),UnicodeValue INT)

WHILE @position <= DATALENGTH(@nstring)
BEGIN
    INSERT @CharList
       SELECT 
           @position as Position,
           CONVERT(nchar(1),SUBSTRING(@nstring, @position, 1)) as UnicodeChar,
           UNICODE(SUBSTRING(@nstring, @position, 1)) as UnicodeValue
    SET @position = @position + 1
END

SELECT * 
FROM @CharList[/sql]
ORDER BY unicodevalue

The output is as follows

32      NULL
33      NULL
34      NULL
35      NULL
36      NULL
37      NULL
38      NULL
39      NULL
40      NULL
41      NULL
42      NULL
43      NULL
44      NULL
45      NULL
46      NULL
47      NULL
48      NULL
49      NULL
50      NULL
51      NULL
52      NULL
53      NULL
54      NULL
55      NULL
56      NULL
57      NULL
58      NULL
59      NULL
60      NULL
61      NULL
62      NULL
11      32
19      32
24      32
25  A   65
20  C   67
12  C   67
1   H   72
2   a   97
13  a   97
17  a   97
7   b   98
10  g   103
15  i   105
5   i   105
21  i   105
26  i   105
18  l   108
29  o   111
28  p   112
14  p   112
9   r   114
3   r   114
4   r   114
30  r   114
27  r   114
6   s   115
16  t   116
22  t   116
31  t   116
8   u   117
23  y   121

Upvotes: 1

Views: 5306

Answers (1)

Yogesh Sharma
Yogesh Sharma

Reputation: 50163

However, if you want to first find the records which have some unicode chars then follow below approach with help of case expression

;WITH CTE
 AS (
 SELECT DATA, 
        CASE
            WHEN(CAST(DATA AS VARCHAR(MAX)) COLLATE SQL_Latin1_General_Cp1251_CS_AS) = DATA
            THEN 0
            ELSE 1
        END HasUnicodeChars,
       ROW_NUMBER() OVER (ORDER BY (SELECT 1)) RN
 FROM <table_name>)

SELECT * FROM CTE where HasUnicodeChars = 1

Upvotes: 1

Related Questions