Gordon
Gordon

Reputation: 6863

Powershell XML ChildNodes returning deeper nodes

I have been using .SelectNodes() for some time, to get the child nodes of a particular node, and it works but as files get bigger it seems to get slower. So I started using .ChildNodes and I am finding that it gets more than just the child nodes, it goes deeper getting grand child nodes. Given this

$xml = [Xml]@"
<root>
   <one>
      <element1>element text</element1>
      <element2>element text</element2>
      <two>
        <element3>element text</element3>
      </two>
      <name>Name text</name>
   </one>
</root>
"@
foreach ($element in $xml.DocumentElement.ChildNodes | where {$_.NodeType -eq 'Element'}) {
    Write-Host "$($element.Name) $($element.InnerText)"
}

I would have expected to only get back the single <one> node, as it is the only child node of root. And yet what I get back is Name text element textelement textelement textName text which makes no sense at all to me. Especially since I would at least have expected multiple items with the last line being name Name text. Instead the first item in the line is the last node's name. Now I know naming a node 'name' is a bad idea, and I am working on code to address that. But even if I change the name of that node, so <nametext>Name text</nametext>, what I get back is a different kind of wrong, one element textelement textelement textName text. So, what am I doing wrong? And, is it even possible to use .ChildNodes and only get the actual child nodes, no deeper? And, what IS going on here? As I suspect if I understood WHY the bowl of Petunias said that, I would understand the universe better.

Upvotes: 1

Views: 3006

Answers (1)

mklement0
mklement0

Reputation: 437718

I would have expected to only get back the single <one> node

Indeed that is what you're getting back, though it isn't obvious from the default display formatting.

Instead the first item in the line is the last node's name.

The simplest form to visualize XML nodes is to access their .OuterXml property, which returns an XML text representation of the node and all of its descendants.

PowerShell's adaptation of the XML DOM represents child elements and attributes of a given element as properties of that element, and such properties shadow (override) type-native properties of the underlying System.Xml.XmlNode instance:

  • A <name> child element (or attribute) therefore shadows the type-native Name property, which is why Name text showed up in your output.

    • The workaround, as AdminOfThings points out, is to use the hidden .get_Name() method.
  • As an aside: another fairly common, albeit technically distinct[1] shadowing scenario is where an array of XmlElement instances, as obtained via member-access enumeration, have <item> child elements, which can then not be accessed via .item and require a looping workaround - see this answer.

But even if I change the name of that node, so <nametext>Name text</nametext>, what I get back is a different kind of wrong, one element textelement textelement textName text.

There is nothing wrong with that result: it reflects the target node's element name, one, followed by direct concatenation of the text nodes it contains, across the sub-node hierarchy (see System.Xml.XmlElement.InnerText), that is, you're seeing the result of the following string concatenation:
'element text' + 'element text' + 'element text' + 'text'


[1] In member-access enumeration, the type-native properties of an array/collection take precedence over properties of their elements, which is logically the inverse of the XML adaptation on a single XmlElement, where the adapted property takes precedence, as discussed.

Upvotes: 1

Related Questions