user1195192
user1195192

Reputation: 699

ElementTree Remove Element

Python noob here. Wondering what's the cleanest and best way to remove all the "profile" tags with updated attribute value of true.

I have tried the following code but it's throwing: SyntaxError("cannot use absolute path on element")

 root.remove(root.findall("//Profile[@updated='true']"))

XML:

<parent>
  <child type="First">
    <profile updated="true">
       <other> </other>
    </profile>
  </child>
  <child type="Second">
    <profile updated="true">
       <other> </other>
    </profile>
  </child>
  <child type="Third">
     <profile>
       <other> </other>
    </profile>
  </child>
</parent>

Upvotes: 4

Views: 11020

Answers (2)

πrr
πrr

Reputation: 1

I searched for a way directly addressing the elements to be deleted, using built-in xml library. Here is the solution:

import itertools
from xml.etree import ElementTree

def removeall(root: ElementTree.Element, match, namespaces=None):
    parent_by_child=dict(itertools.chain.from_iterable(
        ((child, element) for child in element) for element in root.iter()))

    for element in root.findall(match, namespaces):
        parent_by_child[element].remove(element)

Applied on your data:

data = """
<parent>
  <child type="First">
    <profile updated="true">
       <other> </other>
    </profile>
  </child>
  <child type="Second">
    <profile updated="true">
       <other> </other>
    </profile>
  </child>
  <child type="Third">
     <profile>
       <other> </other>
    </profile>
  </child>
</parent>"""

root = ElementTree.fromstring(data)
removeall(root, ".//child/profile[@updated='true']")
print(ElementTree.tostring(root, encoding='unicode'))

Prints:

<parent>
  <child type="First">
    </child>
  <child type="Second">
    </child>
  <child type="Third">
     <profile>
       <other> </other>
    </profile>
  </child>
</parent>

As the question is marked with python2.7: I need to admit that I do not know whether the itertools.chain.from_iterable that I used to built up the child-parent-dict was already present in python2.7.

Upvotes: 0

alecxe
alecxe

Reputation: 473763

If you are using xml.etree.ElementTree, you should use remove() method to remove a node, but this requires you to have the parent node reference. Hence, the solution:

import xml.etree.ElementTree as ET

data = """
<parent>
  <child type="First">
    <profile updated="true">
       <other> </other>
    </profile>
  </child>
  <child type="Second">
    <profile updated="true">
       <other> </other>
    </profile>
  </child>
  <child type="Third">
     <profile>
       <other> </other>
    </profile>
  </child>
</parent>"""

root = ET.fromstring(data)
for child in root.findall("child"):
    for profile in child.findall(".//profile[@updated='true']"):
        child.remove(profile)

print(ET.tostring(root))

Prints:

<parent>
  <child type="First">
    </child>
  <child type="Second">
    </child>
  <child type="Third">
     <profile>
       <other> </other>
    </profile>
  </child>
</parent>

Note that with lxml.etree this would be a bit simpler:

root = ET.fromstring(data)
for profile in root.xpath(".//child/profile[@updated='true']"):
    profile.getparent().remove(profile)

where ET is:

import lxml.etree as ET

Upvotes: 8

Related Questions