Reputation: 833
Bare in mind I am very new to Python. I'm trying to copy few XML nodes from sample1.xml to out.xml if it doesn't exist in sample2.xml.
this is how far I got before I'm stuck
import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='sample1.xml')
addtree = ET.ElementTree(file='sample2.xml')
root = tree.getroot()
addroot = addtree.getroot()
for adel in addroot.findall('.//cars/car'):
for el in root.findall('cars/car'):
with open('out.xml', 'w+') as f:
f.write("BEFORE\n")
f.write(el.tag)
f.write("\n")
f.write(adel.tag)
f.write("\n")
f.write("\n")
f.write("AFTER\n")
el = adel
f.write(el.tag)
f.write("\n")
f.write(adel.tag)
I have no idea what I'm missing, but it's only copying the actual "tag
" itself.
outputs this:
BEFORE
car
car
AFTER
car
car
So I'm missing the children nodes, and also the <
, >
, </
, >
tags. Expected result is below.
sample1.xml:
<cars>
<car>
<use-car>0</use-car>
<use-gas>0</use-gas>
<car-name />
<car-key />
<car-location>hawaii</car-location>
<car-port>5</car-port>
</car>
</cars>
sample2.xml:
<cars>
<old>
1
</old>
<new>
8
</new>
<car />
</cars>
expected result in out.xml (final product)
<cars>
<old>
1
</old>
<new>
8
</old>
<car>
<use-car>0</use-car>
<use-gas>0</use-gas>
<car-name />
<car-key />
<car-location>hawaii</car-location>
<car-port>5</car-port>
</car>
</cars>
All the other nodes old
and new
must remain untouched. I'm just trying to replace <car />
with all its children and grandchildren (if existed) nodes.
Upvotes: 2
Views: 4448
Reputation: 56
First, a couple of trivial issues with your XML:
cars
tag is missing a /
new
tag incorrectly reads old
, should read new
Second, a disclaimer: my solution below has its limitations - in particular, it wouldn't handle repeatedly substituting the car
node from sample1 into multiple spots in sample2. But it works fine for the sample files you've supplied.
Third: thanks to the top couple of answers on access ElementTree node parent node - they informed the implementation of get_node_parent_info
below.
Finally, the code:
import xml.etree.ElementTree as ET
def find_child(node, with_name):
"""Recursively find node with given name"""
for element in list(node):
if element.tag == with_name:
return element
elif list(element):
sub_result = find_child(element, with_name)
if sub_result is not None:
return sub_result
return None
def replace_node(from_tree, to_tree, node_name):
"""
Replace node with given node_name in to_tree with
the same-named node from the from_tree
"""
# Find nodes of given name ('car' in the example) in each tree
from_node = find_child(from_tree.getroot(), node_name)
to_node = find_child(to_tree.getroot(), node_name)
# Find where to substitute the from_node into the to_tree
to_parent, to_index = get_node_parent_info(to_tree, to_node)
# Replace to_node with from_node
to_parent.remove(to_node)
to_parent.insert(to_index, from_node)
def get_node_parent_info(tree, node):
"""
Return tuple of (parent, index) where:
parent = node's parent within tree
index = index of node under parent
"""
parent_map = {c:p for p in tree.iter() for c in p}
parent = parent_map[node]
return parent, list(parent).index(node)
from_tree = ET.ElementTree(file='sample1.xml')
to_tree = ET.ElementTree(file='sample2.xml')
replace_node(from_tree, to_tree, 'car')
# ET.dump(to_tree)
to_tree.write('output.xml')
UPDATE: It was recently brought to my attention that the implementation of find_child()
in the solution I originally supplied would fail if the "child" in question was not in the first branch of the XML tree that was traversed. I've updated the implementation above to rectify this.
Upvotes: 4