user1837862
user1837862

Reputation: 121

XML Deserialize with UTF-8 encoding

I already searched a lot today about this and I can't find how to Deserialize with UTF-8 encoding.

 <?xml version="1.0" encoding="UTF-8"?>
 <AvailabilityRequestV2 xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema- instance"
 siteid="0000"
 apikey="0000"
 async="false" waittime="0">
 <Type>4</Type>
 <Id>159266</Id>
 <Radius>0</Radius>
 <Latitude>0</Latitude>
 <Longitude>0</Longitude>
 </AvailabilityRequestV2>

If I try this

 string xmlString = File above;         
 XmlSerializer serializer = new XmlSerializer(typeof(AvailabilityRequestV2));
 AvailabilityRequestV2 request = (AvailabilityRequestV2)serializer.Deserialize(
     new MemoryStream(Encoding.UTF8.GetBytes(xmlString)));

If I put in debugging mode the mouse over request I get this:

     {<?xml version="1.0" encoding="utf-16"?><AvailabilityRequestV2 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
      xmlns:xsd="http://www.w3.org/2001/XMLSchema"
      ..................

How can I force to be UTF-8 ?

I only saw to Serialize, but Deserialize I didn't.

Upvotes: 3

Views: 26753

Answers (3)

Lloyd
Lloyd

Reputation: 29668

You can use a StreamReader and specify UTF-8, you can also tell it to use the BOM if present:

using (StreamReader reader = new StreamReader("my.xml",Encoding.UTF8,true)) {
    XmlSerializer serializer = new XmlSerializer(typeof(SomeType));

    object result = serializer.Deserialize(reader);
}

I'm unsure what happens when the XML reader however encounters the encoding="utf-16" directive within the XML, it may switch over.

Upvotes: 12

Tom Schulte
Tom Schulte

Reputation: 439

I think the example from @Lloyd needs the new keyword:

using (StreamReader reader = new StreamReader("my.xml",Encoding.UTF8,true)) {

Upvotes: 2

Nicholas Carey
Nicholas Carey

Reputation: 74257

Once you have slurped the contents of a file into a .Net/CLR string, it is UTF-16 encoded: it has been transformed from its original source encoding. The CLR uses UTF-16 internally—hence the reason for a char being 16 bits.

As a result, the encoding specified in the document's [original] XML Declaration is now at odds with the actual encoding of the document.

Best to pass a StreamReader as recommended by @Lloyd above.

Upvotes: 3

Related Questions