ygoe
ygoe

Reputation: 20414

Read a space character from an XML element

A surprisingly simple question this time! :-) There's an XML file like this:

<xml>
  <data> </data>
</xml>

Now I need to read exactly whatever is in the <data> element. Be it a single whitespace like U+0020. My naive guess:

XmlDocument xd = new XmlDocument();
xd.Load(fileName);
XmlNode xn = xd.DocumentElement.SelectSingleNode("data");
string data = xn.InnerText;

But that returns an empty string. The white space got lost. Any other data can be read just fine.

What do I need to do to get my space character here?

After browsing the web for a while, I tried reading the XML file with an XmlReader that lets me set XmlReaderSettings.IgnoreWhitespace = false but that didn't help.

Upvotes: 2

Views: 3165

Answers (1)

CC Inc
CC Inc

Reputation: 5938

You must use xml:space="preserve" in your XML, according to the W3C standards and the MSDN docs.

The W3C standards dictate that white space be handled differently depending on where in the document it occurs, and depending on the setting of the xml:space attribute. If the characters occur within the mixed element content or inside the scope of the xml:space="preserve", they must be preserved and passed without modification to the application. Any other white space does not need to be preserved. The XmlTextReader only preserves white space that occurs within an xml:space="preserve" context.

        XmlDocument xd = new XmlDocument();
        xd.LoadXml(@"<xml xml:space=""preserve""><data> </data></xml>");
        XmlNode xn = xd.DocumentElement.SelectSingleNode("data");
        string data = xn.InnerText; // data == " "
        Console.WriteLine(data == " "); //True

Tested HERE.

Upvotes: 7

Related Questions