Reputation: 4721
I am trying to iterate over all nodes in a tree using ElementTree.
I do something like:
tree = ET.parse("/tmp/test.xml")
root = tree.getroot()
for child in root:
### do something with child
The problem is that child is an Element object and not ElementTree
object, so I can't further look into it and recurse to iterate over its elements. Is there a way to iterate differently over "root" so that it iterates over the top-level nodes in the tree (immediate children) and returns the same class as the root itself?
Upvotes: 48
Views: 133707
Reputation: 711
If you need to iterate starting from an element, you can do that with a simple recursion:
def children(elem):
# do something with elem
if len(elem)>0: # haschildren
for child in elem:
children(child)
And starting from the OP's code:
tree = ET.parse("/tmp/test.xml")
root = tree.getroot()
for child in root:
children(child)
Upvotes: 0
Reputation: 825
excellent solution for xml to dict : see https://stackoverflow.com/a/68082847/3505444
def etree_to_dict(t):
if type(t) is ET.ElementTree: return etree_to_dict(t.getroot())
return {
**t.attrib,
'text': t.text,
**{e.tag: etree_to_dict(e) for e in t}
}
and :
def nested_dict_pairs_iterator(dict_obj):
''' This function accepts a nested dictionary as argument
and iterate over all values of nested dictionaries
'''
# Iterate over all key-value pairs of dict argument
for key, value in dict_obj.items():
# Check if value is of dict type
if isinstance(value, dict):
# If value is dict then iterate over all its values
for pair in nested_dict_pairs_iterator(value):
yield (key, *pair)
else:
# If value is not dict type then yield the value
yield (key, value)
finally :
root_dict = etree_to_dict(myet.root)
for pair in nested_dict_pairs_iterator(root_dict):
print(pair)
Upvotes: 1
Reputation: 6826
While iter()
is all very good, I needed a way to walk an xml hierarchy while tracking the nesting level, and iter()
doesn't help at all with that. I wanted something like iterparse()
which emits start and end events at each level of the hierarchy, but I already have the ElementTree so didn't want the unnecessary step/overhead of convert to string and re-parsing that using iterparse()
would require.
Surprised I couldn't find this, I had to write it myself:
def iterwalk(root, events=None, tags=None):
"""Incrementally walks XML structure (like iterparse but for an existing ElementTree structure)
Returns an iterator providing (event, elem) pairs.
Events are start and end
events is a list of events to emit - defaults to ["start","end"]
tags is a single tag or a list of tags to emit events for - if empty/None events are generated for all tags
"""
# each stack entry consists of a list of the xml element and a second entry initially None
# if the second entry is None a start is emitted and all children of current element are put into the second entry
# if the second entry is a non-empty list the first item in it is popped and then a new stack entry is created
# once the second entry is an empty list, and end is generated and then stack is popped
stack = [[root,None]]
tags = [] if tags is None else tags if type(tags) == list else [tags]
events = events or ["start","end"]
def iterator():
while stack:
elnow,children = stack[-1]
if children is None:
# this is the start of elnow so emit a start and put its children into the stack entry
if ( not tags or elnow.tag in tags ) and "start" in events:
yield ("start",elnow)
# put the children into the top stack entry
stack[-1][1] = list(elnow)
elif len(children)>0:
# do a child and remove it
thischild = children.pop(0)
# and now create a new stack entry for this child
stack.append([thischild,None])
else:
# finished these children - emit the end
if ( not tags or elnow.tag in tags ) and "end" in events:
yield ("end",elnow)
stack.pop()
return iterator
# myxml is my parsed XML which has nested Binding tags, I want to count the depth of nesting
# Now explore the structure
it = iterwalk( myxml, tags='Binding'))
level = 0
for event,el in it():
if event == "start":
level += 1
print( f"{level} {el.tag=}" )
if event == "end":
level -= 1
The stack is used so that you can emit the start events as you go down the hierarchy and then correctly backtrack. The last entry in the stack is initially [el, None] so the start event for el is emitted and the second entry is update to [el,children] with each child being removed from the children as it is entered, until after last child has been done the entry is [el,[]] at which point the end event for el is emitted and the top entry removed from the stack.
I did it this way with the stack because I'm not fond of debugging recursive code and anyway I'm not sure how to write a recursive iterator function.
Here's a recursive version which is easier to understand but would be difficult to debug if it weren't so simple and something went wrong - and I learned about yield from
:-)
def iterwalk1(root, events=None, tags=None):
"""Recuirsive version - Incrementally walks XML structure (like iterparse but for an existing ElementTree structure)
Returns an iterator providing (event, elem) pairs.
Events are start and end
events is a list of events to emit - defaults to ["start","end"]
tags is a single tag or a list of tags to emit events for - if None or empty list then events are generated for all tags
"""
tags = [] if tags is None else tags if type(tags) == list else [tags]
events = events or ["start","end"]
def recursiveiterator(el,suppressyield=False):
if not suppressyield and ( not tags or el.tag in tags ) and "start" in events:
yield ("start",el)
for child in list(el):
yield from recursiveiterator(child)
if not suppressyield and ( not tags or el.tag in tags ) and "end" in events:
yield ("end",el)
def iterator():
yield from recursiveiterator( root, suppressyield=True )
return iterator
Upvotes: 2
Reputation: 20685
To iterate over all nodes, use the iter
method on the ElementTree
, not the root Element.
The root is an Element, just like the other elements in the tree and only really has context of its own attributes and children. The ElementTree
has the context for all Elements.
For example, given this xml
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
You can do the following
>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('test.xml')
>>> for elem in tree.iter():
... print elem
...
<Element 'data' at 0x10b2d7b50>
<Element 'country' at 0x10b2d7b90>
<Element 'rank' at 0x10b2d7bd0>
<Element 'year' at 0x10b2d7c50>
<Element 'gdppc' at 0x10b2d7d10>
<Element 'neighbor' at 0x10b2d7e90>
<Element 'neighbor' at 0x10b2d7ed0>
<Element 'country' at 0x10b2d7f10>
<Element 'rank' at 0x10b2d7f50>
<Element 'year' at 0x10b2d7f90>
<Element 'gdppc' at 0x10b2d7fd0>
<Element 'neighbor' at 0x10b2db050>
<Element 'country' at 0x10b2db090>
<Element 'rank' at 0x10b2db0d0>
<Element 'year' at 0x10b2db110>
<Element 'gdppc' at 0x10b2db150>
<Element 'neighbor' at 0x10b2db190>
<Element 'neighbor' at 0x10b2db1d0>
Upvotes: 59
Reputation: 5109
In addition to Robert Christie's accepted answer, printing the values and tags separately is very easy:
tree = ET.parse('test.xml')
for elem in tree.iter():
print(elem.tag, elem.text)
Upvotes: 16
Reputation: 359
Adding to Robert Christie's answer it is possible to iterate over all nodes using fromstring()
by converting the Element to an ElementTree:
import xml.etree.ElementTree as ET
e = ET.ElementTree(ET.fromstring(xml_string))
for elt in e.iter():
print "%s: '%s'" % (elt.tag, elt.text)
Upvotes: 22
Reputation: 215
you can also access specific elements like this:
country= tree.findall('.//country')
then loop over range(len(country))
and access
Upvotes: 13