Andrew
Andrew

Reputation: 1554

Escaping Unicode string in XmlElement despite writing XML in UTF-8

For a given XmlElement, I need to be able to set the inner text to an escaped version of the Unicode string, despite the document ultimately being encoded in UTF-8. Is there any way of achieving this?

Here's a simple version of the code:

const string text = "ñ";

var document = new XmlDocument {PreserveWhitespace = true};
var root = document.CreateElement("root");
root.InnerXml = text;
document.AppendChild(root);

var settings = new XmlWriterSettings {Encoding = Encoding.UTF8, OmitXmlDeclaration = true};
using (var stream = new FileStream("out.xml", FileMode.Create))
using (var writer = XmlWriter.Create(stream, settings))
    document.WriteTo(writer);

Expected:

<root>&#xF1;</root>

Actual:

<root>ñ</root>

Using an XmlWriter directly and calling WriteRaw(text) works, but I only have access to an XmlDocument, and the serialization happens later. On the XmlElement, InnerText escapes the & to &amp;, as expected, and setting Value throws an exception.

Is there some way of setting the inner text of an XmlElement to the escaped ASCII text, regardless of the encoding that is ultimately used? I feel like I must be missing something obvious, or it's just not possible.

Upvotes: 4

Views: 2511

Answers (1)

bobince
bobince

Reputation: 536409

If you ask XmlWriter to produce ASCII output, it should give you character references for all non-ASCII content.

var settings = new XmlWriterSettings {Encoding = Encoding.ASCII, OmitXmlDeclaration = true};

The output is still valid UTF-8, because ASCII is a subset of UTF-8.

Upvotes: 3

Related Questions