Reputation: 288
I have an XML to parse which is proving really tricky for me.
<bundles>
<bundle>
<bitstreams>
<bitstream>
<id>1234</id>
</bitstream>
</bitstream>
<name>FOO</name>
</bundle>
<bundle> ... </bundle>
</bundles>
I would like to iterate through this XML and locate all the id values inside of the bitstreams for a bundle where the name element's value is 'FOO'. I'm not interested in any bundles not named 'FOO', and there may be any number of bundles and any number of bitstreams in the bundles.
I have been using tree.findall('./bundle/name')
to find the FOO bundle but this just returns a list that I can't step through for the id values:
for node in tree.findall('./bundle/name'):
if node.text == 'FOO':
id_values = tree.findall('./bundle/bitstreams/bitstream/id')
for value in id_values:
print value.text
This prints out all the id values, not those of the bundle 'FOO'.
How can I iterate through this tree, locate the bundle with the name FOO, take this bundle node and collect the id values nested in it? Is the XPath argument incorrect here?
I'm working in Python, with lxml
bindings - but any XML parser I believe would be alright; these aren't large XML trees.
Upvotes: 3
Views: 3641
Reputation: 7348
One of your questions was "Is the XPath argument incorrect here?". Well, findall()
doesn't accept XPath expressions. It uses a simplified version called ElementPath. Also, your second call to findall()
is not related in any way to the result of the first one, so it will just return id
s of all bundle
s.
A slight modification to your code should also work (it's basically the same as the XPath expression):
for node in tree.findall('./bundle/name'):
if node.text != 'FOO':
continue
id_values = node.getparent().findall('./bitstreams/bitstream/id')
for value in id_values:
print value.text
Upvotes: 2
Reputation: 1177
You can use xpath
to achieve the purpose. Following Python code works perfect:
import libxml2
data = """
<bundles>
<bundle>
<bitstreams>
<bitstream>
<id>1234</id>
</bitstream>
</bitstreams>
<name>FOO</name>
</bundle>
</bundles>
"""
doc = xmllib2.parseDoc(data)
for node in doc.xpathEval('/bundles/bundle/name[.="FOO"]/../bitstreams/bitstream/id'):
print node
or using lxml
(data
is the same as in the example above):
from lxml import etree
bundles = etree.fromstring(data)
for node in bundles.xpath('bundle/name[.="FOO"]/../bitstreams/bitstream/id'):
print(node.text)
outputs:
1234
If the <bitstreams>
element always precedes the <name>
element, you can also use the more efficient xpath expression:
'bundle/name[.="FOO"]/preceding-sibling::bitstreams/bitstream/id'
Upvotes: 6