Reputation: 1736
I am storing bodies of text in SQL Server.
Some bodies of text contain Unicode characters that will be lost when storing in a VARCHAR
column within SQL Server.
As only a small portion of text bodies stored will require a NVARCHAR
column, I have decided to create 2 columns, one for VARCHAR
text and the other a NVARCHAR
text. This way I can save on space by only storing Unicode bodies of text in the NVARCHAR
column and the rest in the VARCHAR
column.
The question is: how do I detect if a body of text contains Unicode characters so that I can determine the best column to store it in?
Upvotes: 1
Views: 494
Reputation: 452947
You could either determine the 256 characters available in your collation's code page and inspect the string for any characters not in that set or cast it to varchar
and then compare it to the nvarchar
original.
If you are using code page 1252 then the first approach could be done with
DECLARE @String NVARCHAR(MAX) = N'൯'
SELECT CASE
WHEN @String LIKE '%[^' COLLATE Latin1_General_100_BIN + CHAR(0) + '-' + CHAR(255) + ']%'
THEN 'varchar not OK'
ELSE 'varchar OK'
END
and the second approach...
DECLARE @String NVARCHAR(MAX) = N'൯'
SELECT CASE
WHEN CAST(@String AS VARCHAR(MAX)) = @String
THEN 'varchar OK'
ELSE 'varchar not OK'
END
BTW: If you use row compression you also get Unicode compression thrown in which would largely negate the need for this.
Upvotes: 3