speedyrazor
speedyrazor

Reputation: 3215

remove tag and contents based on child tag value - python lxml

I am trying to locate a particular tag, based on it child's contents and remove the parent tag and contents, but cant find an answer. Here is my xml:

<video>
    <crew>
      <member billing="top">
        <name>Some Guy</name>
        <roles>
          <role>Painter</role>
          <role>Decorator</role>
        </roles>
      </crew>
      <crew billing="top">
        <name>Another Guy</name>
        <roles>
          <role>Primary</role>
        </roles>
      </crew>
    </crew>
</video>

What I want to do is search to see if the <role>Primary</role> exists in a <crew> block, if it does I want to delete the whole <crew> block which the <role>Primary</role> exists in, it's parent. So the result would be:

<video>
    <crew>
      <member billing="top">
        <name>Some Guy</name>
        <roles>
          <role>Painter</role>
          <role>Decorator</role>
        </roles>
      </crew>
</video>

It's sometimes not at the end and maybe buried within many other <crew> tags, so I know that if that block contains <role>Primary</role> I want to remove that whole <crew> block it resides in. I have tried:

for find1 in root.iter(tag='role'):
    find1 = find1.text
    if find1 == "Primary":
        path = tree.xpath('//video/crew')
        etree.strip_elements(path, 'member')

but this removes every <crew> tag and it's contents. Kind regards.

Upvotes: 1

Views: 856

Answers (1)

falsetru
falsetru

Reputation: 369074

Using xpath:

for crew in root.xpath('.//crew[descendant::role[contains(text(), "Primary")]]'):
    crew.getparent().remove(crew)

Upvotes: 2

Related Questions