Cannot store particular Unicode code points / characters in NVARCHAR fields

Question

I'm doing some tests with SQL Server 2017. I'm trying to store arbitrary Unicode code points in an NVARCHAR column. I've tried different collations. I have no problem with common characters in the BMP plane of Unicode.

For more exotic symbols, for example if I try to store the "𝌹" character (U+1D33), the following happens:

If I do it within Management Studio, I only see the infamous square symbol. But Management Studio has the proper font since I can paste it in the query editor.
If I send the text from Visual Studio, the value I see in Management Studio is "??", that's what I retrieve from Visual Studio, too, after performing a query.

My understanding is, for non-supplementary character collations, characters outside the UCS-2 subset shouldn't be interpreted correctly because NCHAR fields are limited to 2 bytes.

But, I tried with Latin1_General_100_CS_AS_KS_WS_SC, both at the DB level and column level, and it doesn't seem to work either.

Any ideas? Thanks

Panagiotis Kanavos · Accepted Answer

I can't reproduce any data loss or encoding issue. I can reproduce a squares that becomes 𝌹 when copied. It's probably caused by the font used to display results in the SSMS grid or the Visual Studio debugger windows.

SQL Server and Windows use UTF16 for some time now, not UCS-2. Few fonts support the full UTF16 range though.

When I tried this in SSMS :

create table #tc(name nvarchar(20));
insert into #tc values (N'𝌹');

select name,len(name),DATALENGTH(name) from #tc;

I saw a square, 2 and 4 in the grid. This means the character was stored properly and took 4 bytes. When I tried to copy those results to SO though I saw :

name    (No column name)    (No column name)
𝌹      2                    4

When I used Result to Text I got the actual character :

name                             
-------------------- ----------- -----------
𝌹                   2           4

The correct character is there but the SSMS grid's font can't display it

Update

As Dan Guzman noted,the font can be changed from Tools-->Options-->Environment-->Fonts and Colors-->Show settings for:-->Grid Results. The default font is Microsoft Sans Serif, a small font (855KB) used as the default font on Windows. It contains "only" 3000 glyphs. Chinese characters aren't included, which is why squares are displayed.

Chinese computers use SimShun as the default though, whose file is 17.1MB. They wouldn't have any problem displaying chinese characters.

Cannot store particular Unicode code points / characters in NVARCHAR fields

Answers (2)

Related Questions