Reputation: 1148
I have a bunch of questions related to whitespace handling with XmlDocument
. Please see the numbered comments in the example below.
Shouldn't all whitespace be significant in mixed mode? Why the space between the a
tags is not significant?
While I understand that the actual whitespace element is still an XmlWhitespace
, how do I normalize these spaces into XmlSignificantWhitespace
nodes? Normalize()
doesn't work.
Is my only option to do it manually?
Here's my test case:
private static void Main()
{
// 1. Shouldn't all whitespace be significant in mixed mode? Why the space between the a tags is not significant?
var doc = new XmlDocument
{
InnerXml = "<root>test1 <a>test2</a> <a>test3</a></root>",
};
PrintDoc(doc);
// 2.a. While I understand that the actual whitespace element is still XmlWhitespace, how do I normalize these spaces into XmlSignificantWhitespaces?
doc.DocumentElement.RemoveAll();
doc.DocumentElement.SetAttribute("xml:space", "preserve");
var fragment = doc.CreateDocumentFragment();
fragment.InnerXml = "test1 <a>test2</a> <a>test3</a>";
doc.DocumentElement.PrependChild(fragment);
PrintDoc(doc);
// 2.b. Normalize doesn't work
doc.Normalize();
PrintDoc(doc);
// 3.a. Manual normalization does work, is there a better way?
doc.DocumentElement.RemoveAllAttributes();
var whitespaces = doc.DocumentElement.ChildNodes.Cast<XmlNode>()
.OfType<XmlWhitespace>()
.ToList();
foreach (var whitespace in whitespaces)
{
var significant = doc.CreateSignificantWhitespace(whitespace.Value);
doc.DocumentElement.ReplaceChild(significant, whitespace);
}
PrintDoc(doc);
// 3.b. Reading from string also works
doc.InnerXml = "<root xml:space=\"preserve\">test1 <a>test2</a> <a>test3</a></root>";
PrintDoc(doc);
}
private static void PrintDoc(XmlDocument doc)
{
var nodes = doc.DocumentElement.ChildNodes.Cast<XmlNode>().ToList();
var whitespace = nodes.OfType<XmlWhitespace>().Count();
var significantWhitespace = nodes.OfType<XmlSignificantWhitespace>().Count();
Console.WriteLine($"Xml: {doc.InnerXml}\nwhitespace: {whitespace}\nsignificant whitespace: {significantWhitespace}\n");
}
The output is following:
Xml: <root>test1 <a>test2</a><a>test3</a></root>
whitespace: 0
significant whitespace: 0
Xml: <root xml:space="preserve">test1 <a>test2</a> <a>test3</a></root>
whitespace: 1
significant whitespace: 0
Xml: <root xml:space="preserve">test1 <a>test2</a> <a>test3</a></root>
whitespace: 1
significant whitespace: 0
Xml: <root>test1 <a>test2</a> <a>test3</a></root>
whitespace: 0
significant whitespace: 1
Xml: <root xml:space="preserve">test1 <a>test2</a> <a>test3</a></root>
whitespace: 0
significant whitespace: 1
Upvotes: 2
Views: 3828
Reputation: 1148
Writing your own XmlNodeReader
seems to work, although it is not the "cleanest" solution.
Consider the current implementation here:
public virtual XmlNodeType MoveToContent() {
do {
switch (this.NodeType) {
case XmlNodeType.Attribute:
MoveToElement();
goto case XmlNodeType.Element;
case XmlNodeType.Element:
case XmlNodeType.EndElement:
case XmlNodeType.CDATA:
case XmlNodeType.Text:
case XmlNodeType.EntityReference:
case XmlNodeType.EndEntity:
return this.NodeType;
}
} while (Read());
return this.NodeType;
}
To get mark SignificantWhitespace
as content, you may return the NodeType
when it is XmlNodeType.SignificantWhitespace
.
Here's the complete implementation of my own WhitespaceXmlNodeReader
:
internal class WhitespaceXmlNodeReader : XmlNodeReader
{
public WhitespaceXmlNodeReader(XmlNode node)
: base(node)
{
}
public override XmlNodeType MoveToContent()
{
do
{
switch (NodeType)
{
case XmlNodeType.Attribute:
MoveToElement();
goto case XmlNodeType.Element;
case XmlNodeType.Element:
case XmlNodeType.EndElement:
case XmlNodeType.CDATA:
case XmlNodeType.Text:
case XmlNodeType.EntityReference:
case XmlNodeType.EndEntity:
// This was added:
case XmlNodeType.SignificantWhitespace:
return NodeType;
}
} while (Read());
return NodeType;
}
}
Upvotes: 1
Reputation: 1797
The Microsoft documentation is unclear and at least partly inaccurate. Although the Microsoft documentation for the XmlSignificantWhitespace Class says that "white space between markup in a mixed content node" is "significant whitespace," the actual XmlDocument loading and parsing behavior is not consistent with that. Related documentation is PreserveWhitespace and White Space and Significant White Space Handling when Loading the DOM, but these don't provide enough specific detail.
Empirically, as you've demonstrated with your test cases and with my own testing, the behavior is as follows:
XmlDocument.PreserveWhitespace = true
upon load and within an xml:space="preserve"
scope. However, for the former, it is preserved in Whitespace
nodes rather than SignificantWhitespace
nodes.XmlDocument.PreserveWhitespace = false
, then whitespace between elements in a mixed content node is discarded, contrary to the XmlSignificantWhitespace Class documentation.SignfiicantWhitespace
nodes within an xml:space="preserve"
scope. In this case, it is always preserved as SignificantWhitespace
, regardless of the XmlDocument.PreserveWhitespace
setting.In short, the only way to parse whitespace directly into SignificantWhitespace
nodes is within an xml:space="preserve"
scope. One way that might work for you is to wrap your XML content in a new outer element with xml:space="preserve"
scope. I don't know why your CreateDocumentFragment()
test did not work, but here's a bit of code that does work:
// 4. Loading the XML within an xml:space="preserve" element works
doc.InnerXml = "<root xml:space=\"preserve\"></root>";
doc.FirstChild.InnerXml = "test1 <a>test2</a> <a>test3</a>";
PrintDoc(doc);
This results in:
Xml: <root xml:space="preserve">test1 <a>test2</a> <a>test3</a></root>
whitespace: 0
significant whitespace: 1
Upvotes: 3