Reputation: 6332
I have some xml;
<root>
<parent>
<child>foo987654</child>
</parent>
<parent>
<child>bar15245</child>
</parent>
<parent>
<child>baz87742</child>
</parent>
<parent>
<child>foo123456</child>
</parent>
</root>
I'm using python and the etree module and I'd like to select all <parent>
nodes whose child starts with "foo". I know etree has limited xpath support but i'm an xpath rookie so I'm struggling to land on the best solution. I'd think something to this effect
parent[(contains(child,'foo'))]
but i would want to reject parent nodes that contained foo but didn't start with foo (ie <child>125456foo</child>
) so i'm not sure this would work. Furthermore, I'm not sure etree supports this level of xpath...
EDIT:
Another acceptable solution would be to to select parents whose children's text are in a list. pseudo code parent=>child[text = "foo1" || "bar1" || "bar2"]
Is that possible?
Upvotes: 3
Views: 1165
Reputation: 11396
with xpath
import lxml.html
doc = lxml.html.document_fromstring(s)
for e in doc.xpath(".//child[starts-with(text(), 'foo')]"):
print e.text
Upvotes: 0
Reputation: 43437
This will get what you want:
[elem for elem in root.findall('parent') if elem.find('child').text.startswith('foo')]
Watch it in action:
s = """<root>
<parent>
<child>foo987654</child>
</parent>
<parent>
<child>bar15245</child>
</parent>
<parent>
<child>baz87742</child>
</parent>
<parent>
<child>foo123456</child>
</parent>
</root>"""
import xml.etree.ElementTree as ET
root = ET.fromstring(s)
elems = [elem for elem in root.findall('parent') if elem.find('child').text.startswith('foo')]
Checking the data:
for elem in elems:
print elem.find('child').text
>>>
foo987654
foo123456
Upvotes: 4
Reputation: 17038
As you can see from the xml.etree
documentation, this library doesn't support the contains()
operator from XPath. My suggestion would be to select all children with the XPath /parent
and then iterating on each result to remove children's content that do not start with foo.
Upvotes: 0