Detecting Unicode Text in SQL Server

Question

I am storing bodies of text in SQL Server.

Some bodies of text contain Unicode characters that will be lost when storing in a VARCHAR column within SQL Server.

As only a small portion of text bodies stored will require a NVARCHAR column, I have decided to create 2 columns, one for VARCHAR text and the other a NVARCHAR text. This way I can save on space by only storing Unicode bodies of text in the NVARCHAR column and the rest in the VARCHAR column.

The question is: how do I detect if a body of text contains Unicode characters so that I can determine the best column to store it in?

Martin Smith · Accepted Answer

You could either determine the 256 characters available in your collation's code page and inspect the string for any characters not in that set or cast it to varchar and then compare it to the nvarchar original.

If you are using code page 1252 then the first approach could be done with

DECLARE @String NVARCHAR(MAX) = N'൯'

SELECT CASE
         WHEN @String LIKE '%[^' COLLATE Latin1_General_100_BIN + CHAR(0) + '-' + CHAR(255) + ']%'
           THEN 'varchar not OK'
         ELSE 'varchar OK'
       END

and the second approach...

DECLARE @String NVARCHAR(MAX) = N'൯'

SELECT CASE
         WHEN CAST(@String AS VARCHAR(MAX)) = @String
           THEN 'varchar OK'
         ELSE 'varchar not OK'
       END

BTW: If you use row compression you also get Unicode compression thrown in which would largely negate the need for this.

Detecting Unicode Text in SQL Server

Answers (1)

Related Questions