Reputation: 4891
I am seeing some behaviour I don't expect with XmlTextWriter. When I specifying the encoding when I instantiate the writer by either
new XmlTextWriter(fs, Encoding.UTF8)
or
XmlWriter.Create(fs, new XmlWriterSettings(){Encoding = Encoding.UTF8} )
the document produced has a leading hex character at the start of the document. Since the C++ parser I am passing the XML to cannot read this, I want to avoid this character. Interestingly, when I create the writer like this
new XmlTextWriter(fs, null)
I get the exact behaviour I expect. How do I rectreate this instantiation in code without leaving the parameter null?
Upvotes: 0
Views: 2210
Reputation: 1135
I think the "leading hex character" is a byte order mark (BOM) as I commented on your question, though I can't be sure without actually seeing it. The C++ parser seems not to know about BOMs, which is odd (see standard reference by Joel Spolsky).
Let's assume that the C++ parser works only with XML encoded as UTF-8 or one of its character subsets (ASCII, ISO-8859-1, etc.). In that case you have no option but to encode as UTF-8 but exclude the BOM. XmlWriter
lets you do so as follows:
var utf8NoBom = new UTF8Encoding(false);
var writer = XmlWriter.Create(fs, new XmlWriterSettings() { Encoding = utf8NoBom } );
The quote below is from the MSDN reference on XmlWriter.Create:
XmlWriter always writes a Byte Order Mark (BOM) to the underlying data stream; however, some streams must not have a BOM. To omit the BOM, create a new XmlWriterSettings object and set the Encoding property to be a new UTF8Encoding object with the Boolean value in the constructor set to false.
EDIT: If the C++ parser is a general-purpose XML parser then its ignorance of BOMs is odd. If the parser is domain-specific, i.e. if it is always used with files whose character encoding is known (and obviously limited), then its ignorance is not odd. I think this is Spolsky's point.
Upvotes: 2