fretje
fretje

Reputation: 8382

XmlTextWriter serialization problem

I'm trying to create a piece of xml. I've created the dataclasses with xsd.exe. The root class is MESSAGE.

So after creating a MESSAGE and filling all its properties, I serialize it like this:

serializer = new XmlSerializer(typeof(Xsd.MESSAGE));
StringWriter sw = new StringWriter();
serializer.Serialize(sw, response);
string xml = sw.ToString();

Up until now all goes well, the string xml contains valid (UTF-16 encoded) xml. Now I like to create the xml with UTF-8 encoding instead, so I do it like this:

Edit: forgot to include the declaration of the stream

serializer = new XmlSerializer(typeof(Xsd.MESSAGE));
using (MemoryStream stream = new MemoryStream())
{
    XmlTextWriter xtw = new XmlTextWriter(stream, Encoding.UTF8);
    serializer.Serialize(xtw, response);
    string xml = Encoding.UTF8.GetString(stream.ToArray());
}

And here comes the problem: Using this approach, the xml string is prepended with an invalid char (the infamous square).
When I inspect the char like this:

char c = xml[0];

I can see that c has a value of 65279.
Anybody has a clue where this is coming from?
I can easily solve this by cutting off the first char:

xml = xml.SubString(1);

But I'd rather know what's going on than blindly cutting of the first char.

Anybody can shed some light on this? Thanks!

Upvotes: 16

Views: 6733

Answers (2)

Chris W. Rea
Chris W. Rea

Reputation: 5501

Here's your code modified to not prepend the byte-order-mark (BOM):

var serializer = new XmlSerializer(typeof(Xsd.MESSAGE));
Encoding utf8EncodingWithNoByteOrderMark = new UTF8Encoding(false);
XmlTextWriter xtw = new XmlTextWriter(stream, utf8EncodingWithNoByteOrderMark);
serializer.Serialize(xtw, response);
string xml = Encoding.UTF8.GetString(stream.ToArray());

Upvotes: 17

Jon Skeet
Jon Skeet

Reputation: 1504062

65279 is the Unicode byte order mark - are you sure you're getting 65249? Assuming it really is the BOM, you could get rid of it by creating a UTF8Encoding instance which doesn't use a BOM. (See the constructor overloads for details.)

However, there's an easier way of getting UTF-8 out. You can use StringWriter, but a derived class which overrides the Encoding property. See this answer for an example.

Upvotes: 7

Related Questions