Brad
Brad

Reputation: 6332

Find non-root parent node where child contains some text

I have some xml;

<root>
    <parent>
        <child>foo987654</child>
    </parent>
    <parent>
        <child>bar15245</child>
    </parent>
    <parent>
        <child>baz87742</child>
    </parent>
    <parent>
        <child>foo123456</child>
    </parent>
</root>

I'm using python and the etree module and I'd like to select all <parent> nodes whose child starts with "foo". I know etree has limited xpath support but i'm an xpath rookie so I'm struggling to land on the best solution. I'd think something to this effect

parent[(contains(child,'foo'))] 

but i would want to reject parent nodes that contained foo but didn't start with foo (ie <child>125456foo</child>) so i'm not sure this would work. Furthermore, I'm not sure etree supports this level of xpath...

EDIT:

Another acceptable solution would be to to select parents whose children's text are in a list. pseudo code parent=>child[text = "foo1" || "bar1" || "bar2"]

Is that possible?

Upvotes: 3

Views: 1165

Answers (3)

Guy Gavriely
Guy Gavriely

Reputation: 11396

with xpath

import lxml.html
doc = lxml.html.document_fromstring(s)
for e in doc.xpath(".//child[starts-with(text(), 'foo')]"):
    print e.text

Upvotes: 0

Inbar Rose
Inbar Rose

Reputation: 43437

This will get what you want:

[elem for elem in root.findall('parent') if elem.find('child').text.startswith('foo')]

Watch it in action:

s = """<root>
    <parent>
        <child>foo987654</child>
    </parent>
    <parent>
        <child>bar15245</child>
    </parent>
    <parent>
        <child>baz87742</child>
    </parent>
    <parent>
        <child>foo123456</child>
    </parent>
</root>"""

import xml.etree.ElementTree as ET

root = ET.fromstring(s)
elems = [elem for elem in root.findall('parent') if elem.find('child').text.startswith('foo')]

Checking the data:

for elem in elems:
    print elem.find('child').text
>>>
foo987654
foo123456

Upvotes: 4

Paul Mougel
Paul Mougel

Reputation: 17038

As you can see from the xml.etree documentation, this library doesn't support the contains() operator from XPath. My suggestion would be to select all children with the XPath /parent and then iterating on each result to remove children's content that do not start with foo.

Upvotes: 0

Related Questions