user369117
user369117

Reputation: 825

Store UTF8 data in UTF16 column

I'm storing XML in an XML column in SQL Server. SQL Server stores the data internally in UTF-16. Therefore the XML that is stored has to be in UTF-16.

The XML I have is in utf-8, it has this declaration on top:

<?xml version="1.0" encoding="UTF-8" ?>

When I try to insert xml with the UTF-8 declaration I get an exception saying something about the encoding. I can easily fix this in two ways:

:

<?xml version="1.0" encoding="UTF-16" ?>

Problem

I don't know if it's 'safe' or correct to just remove or replace the declaration. Will I lose data, or will the XML become corrupt? Or do I have to convert the string in C# from utf-8 to utf-16?

Upvotes: 3

Views: 1590

Answers (2)

Andomar
Andomar

Reputation: 238296

C# stores strings in UCS-2, an older version of the UTF-16 standard. So when you read a UTF-8 string in C#, C# converts it to UCS-2. It's the UCS-2 variant that you transmit to SQL Server.

You can change the xml declaration to encoding="UTF-16" or omit it altogether. There are some differences between UCS-2 and UTF-16; I'd be interesting in knowing how that affects C# and SQL Server!

Upvotes: 3

Jirka Hanika
Jirka Hanika

Reputation: 13547

SQL Server internally uses UCS-2 to store XML data, but this has nothing to do with the form in which you pass the data to SQL Server.

If for example you insert it using a varchar literal, make it an nvarchar literal instead and declare the encoding to be UTF-16. Sample:

DECLARE @VAR XML
INSERT INTO MyTable (MyXmlColumn) 
    VALUES (N'<?xml version="1.0" encoding="UTF-16" ?><doc></doc>')

Upvotes: 0

Related Questions