Reputation: 699
Python noob here. Wondering what's the cleanest and best way to remove all the "profile
" tags with updated
attribute value of true
.
I have tried the following code but it's throwing: SyntaxError("cannot use absolute path on element")
root.remove(root.findall("//Profile[@updated='true']"))
XML:
<parent>
<child type="First">
<profile updated="true">
<other> </other>
</profile>
</child>
<child type="Second">
<profile updated="true">
<other> </other>
</profile>
</child>
<child type="Third">
<profile>
<other> </other>
</profile>
</child>
</parent>
Upvotes: 4
Views: 11020
Reputation: 1
I searched for a way directly addressing the elements to be deleted, using built-in xml
library. Here is the solution:
import itertools
from xml.etree import ElementTree
def removeall(root: ElementTree.Element, match, namespaces=None):
parent_by_child=dict(itertools.chain.from_iterable(
((child, element) for child in element) for element in root.iter()))
for element in root.findall(match, namespaces):
parent_by_child[element].remove(element)
Applied on your data:
data = """
<parent>
<child type="First">
<profile updated="true">
<other> </other>
</profile>
</child>
<child type="Second">
<profile updated="true">
<other> </other>
</profile>
</child>
<child type="Third">
<profile>
<other> </other>
</profile>
</child>
</parent>"""
root = ElementTree.fromstring(data)
removeall(root, ".//child/profile[@updated='true']")
print(ElementTree.tostring(root, encoding='unicode'))
Prints:
<parent>
<child type="First">
</child>
<child type="Second">
</child>
<child type="Third">
<profile>
<other> </other>
</profile>
</child>
</parent>
As the question is marked with python2.7: I need to admit that I do not know whether the itertools.chain.from_iterable
that I used to built up the child-parent-dict was already present in python2.7.
Upvotes: 0
Reputation: 473763
If you are using xml.etree.ElementTree
, you should use remove()
method to remove a node, but this requires you to have the parent node reference. Hence, the solution:
import xml.etree.ElementTree as ET
data = """
<parent>
<child type="First">
<profile updated="true">
<other> </other>
</profile>
</child>
<child type="Second">
<profile updated="true">
<other> </other>
</profile>
</child>
<child type="Third">
<profile>
<other> </other>
</profile>
</child>
</parent>"""
root = ET.fromstring(data)
for child in root.findall("child"):
for profile in child.findall(".//profile[@updated='true']"):
child.remove(profile)
print(ET.tostring(root))
Prints:
<parent>
<child type="First">
</child>
<child type="Second">
</child>
<child type="Third">
<profile>
<other> </other>
</profile>
</child>
</parent>
Note that with lxml.etree
this would be a bit simpler:
root = ET.fromstring(data)
for profile in root.xpath(".//child/profile[@updated='true']"):
profile.getparent().remove(profile)
where ET
is:
import lxml.etree as ET
Upvotes: 8