Reputation: 13441
I have a xml like below. I want to get all direct child nodes under Node1. I am trying to use childNodes, however, it return Node21 and Node22 also. How can I just get those direct chirld nodes
<Node1>
<Node11>
<Node21>
</Node21>
<Node22>
</Node22>
<Node23>
</Node23>
</Node11>
<Node12>
</Node12>
<Node13>
</Node13>
</Node1>
UPDATE Sorry for the confusion. I made a mistake, it seems it only get the direct child nodes. However, the item number is the childnodes still exceeds the real child nodes. I try to get the nodeName. I get a lot of "#text"
Upvotes: 2
Views: 4920
Reputation: 273416
xml.ElementTree.Element
supports the iterator protocol, so you can use list(elem)
as follows:
import xml.etree.cElementTree as ET
s = '''
<Node1>
<Node11>
<Node21>
</Node21>
<Node22>
</Node22>
<Node23>
</Node23>
</Node11>
<Node12>
</Node12>
<Node13>
</Node13>
</Node1>
'''
root = ET.fromstring(s)
print root
print list(root)
Upvotes: 4
Reputation: 3326
There are two ways you can go about handling the text nodes. If you really want to keep using dom, you can get rid of the text nodes with a filter:
>>> filter(lambda node: node.nodeType != xml.dom.Node.TEXT_NODE, myNode.childNodes)
[<DOM Element: Node11 at 0x18e64d0>, <DOM Element: Node12 at 0x18e6950>, <DOM Element: Node13 at 0x18e6a70>]
or a list comprehension:
>>> [x for x in myNode.childNodes if x.nodeType != xml.dom.Node.TEXT_NODE]
[<DOM Element: Node11 at 0x18e64d0>, <DOM Element: Node12 at 0x18e6950>, <DOM Element: Node13 at 0x18e6a70>]
If you don't need to keep using dom, I would suggest using ElementTree as Eli Bendersky suggested.
Upvotes: 1