Reputation: 2996
I have an XML document that represents a directed graph. It contains a large number of direct children, all with ids, and a large number of nested children, all with the same tag names but no ids, just references.
I would like to iterate over all of the direct children of the root node, but exclude the nested children. The files look something like this, but with hundreds of nodes and dozens of different tags:
<graph>
<foo id="f1"><bar ref="b1" /><baz ref="z1" />...</foo>
<bar id="b1"><foo ref="f1" /></bar>
<baz id="z1"></baz>
...
</graph>
I don't want to use getElementsByTagName
because it returns all descendents. I suspect I will need to use .childnodes
and filter the results, but I want to make sure there isn't something I'm missing.
Also, I don't have control of the input, it's from an outside source, and I'm using Python's xml.dom.minidom module, but I expect that to be an implementation detail.
Upvotes: 4
Views: 1971
Reputation: 11
As information (for Alessandro):
For xml.dom.minidom
you can find examples like this to get the children from a node (xmlNode) with a given type (name):
children = xmlNode.getElementsByTagName(name)
If you use the snippet on graph
in the given example to get all foo
s you will not get 1 foo but 2.
There is another foo
inside the bar
.
<bar id="b1">
<foo ref="f1" />
</bar>
All the examples found online use the function that finds all elements that are somewhere inside the tree under the given node.
Maybe, still looking for one. (There might be one - or not.)
Currently testing getting lists using
xmlNode.childNodes
The problem with this seems to be that getElementsByTagName
is not available on the nodes you get. But I do not care.
And it stops on lower layers for some reason. So I'm looking into accessing elements of the lists.
Starting to look into
xmlNode.childNodes.item(i)
because it is not working as intended yet. (0 calls to the functions analyzing the graph)
If you don't want to run into this problem: use a different module. (See post by Alessandro.)
TL;DR: Most of the examples you can find online assume that you are not looking for the direct child of a node or that nodes of the same type are not on lower layers of the (sub)tree you are searching in.
-> Examples can be wrong / insufficient. RTFM ;)
Upvotes: 1
Reputation: 38
Not really sure what you wanted to get out of the directed children, so gave you a few different examples.
from lxml import etree
root = etree.fromstring(xml)
for node in root.iter("graph"):
#To get the namespaces of the direct children
namespaces = [child.namespace for child in node]
#To get the tags of the direct children
tags = [child.tag for child in node]
#To get the text of the direct children
texts = [child.text for child in node]
Upvotes: 1