Reputation: 825
I'm storing XML in an XML column in SQL Server. SQL Server stores the data internally in UTF-16. Therefore the XML that is stored has to be in UTF-16.
The XML I have is in utf-8, it has this declaration on top:
<?xml version="1.0" encoding="UTF-8" ?>
When I try to insert xml with the UTF-8 declaration I get an exception saying something about the encoding. I can easily fix this in two ways:
by removing the declaration or
by changing the declaration to
:
<?xml version="1.0" encoding="UTF-16" ?>
Problem
I don't know if it's 'safe' or correct to just remove or replace the declaration. Will I lose data, or will the XML become corrupt? Or do I have to convert the string in C# from utf-8 to utf-16?
Upvotes: 3
Views: 1590
Reputation: 238296
C# stores strings in UCS-2, an older version of the UTF-16 standard. So when you read a UTF-8 string in C#, C# converts it to UCS-2. It's the UCS-2 variant that you transmit to SQL Server.
You can change the xml declaration to encoding="UTF-16"
or omit it altogether. There are some differences between UCS-2 and UTF-16; I'd be interesting in knowing how that affects C# and SQL Server!
Upvotes: 3
Reputation: 13547
SQL Server internally uses UCS-2 to store XML data, but this has nothing to do with the form in which you pass the data to SQL Server.
If for example you insert it using a varchar
literal, make it an nvarchar
literal instead and declare the encoding to be UTF-16. Sample:
DECLARE @VAR XML
INSERT INTO MyTable (MyXmlColumn)
VALUES (N'<?xml version="1.0" encoding="UTF-16" ?><doc></doc>')
Upvotes: 0