joelc
joelc

Reputation: 2761

Iterate each node in XmlDocument by what it contains

I'm having a difficult time determining while iterating each node in an XML document (recursively) determining if the current node has a value, or, if it has embedded XML.

It seems that XmlNode.NodeType is set to Element in both cases, and in cases where the XML has a value (and not more XML) the ChildNodes.Count is not null (actually, it's 1).

A simple XML file I'm using for testing is:

<note>
  <to>You</to>
  <from>Me</from>
  <subject>Hello!</subject>
  <body>Check out this cool data!</body>
  <data>
    <name>Something cool</name>
    <location>Mars</location>
    <distance>54 million kilometers</distance>
  </data>
</note>

Each of the XmlNodes above is 'Element' and with ChildNodes >= 1.

What can I use to reliably test if an XmlNode should be treated as a container (like note and data) or as holding a value (like to, from, subject, body, name, location, distance)?

Upvotes: 1

Views: 169

Answers (4)

Dave
Dave

Reputation: 70

Check out the answers from this post to see if it gets you going in the right direction:

How to get "real" ChildNodes of XmlNode, ignoring whitespace nodes?

Upvotes: 0

Richard Schneider
Richard Schneider

Reputation: 35464

From your example, you could check for the 1st child node being of type Element.

bool isContainer(XmlNode node) {
  return node.ChildNodes.Count > 0 && node.ChildNodes[0].NodeType == XmlNodeType.Element;
}

Note that this will not handle mixed content data.

Upvotes: 0

TKharaishvili
TKharaishvili

Reputation: 2099

I don't know if you can use System.Xml.Linq.XElement instead of XmlDocument here but if you can, you can go about this the following way:

var xml = XElement.Parse("<note> .... </note>");

then

xml.Elements().Count()

returns 5 the correct number of subnodes, whereas

xml.Elements().First().Elements().Count()

returns 0 because the to node has zero children...

Upvotes: 1

Alexei Levenkov
Alexei Levenkov

Reputation: 100630

Usually you know what nodes contain values by knowing structure of XML.

If you need to infer that information from XML of any structure - text represented by TEXT and CDATA nodes so you can check if element has only children of those types to get "text only" nodes. See How to get text inside an XmlNode.

Some gotcha to be aware of/make decisions about:

  • mixed content nodes (<r>foo <v/> bar</r>) - decide what you want to do with them. I.e. nodes with HTML content generally contain "mixed content".
  • text nodes representing insignificant white-space between elements (<r> <n/> </r>). You should ignore those unless you must preserve document formatting
  • multiple nodes representing single piece of text. Depending on a way XML is loaded or constructed single pieces of text may be represented by collection of child text nodes instead of a single node.

Upvotes: 1

Related Questions