Reputation: 11734
I have the following xml:
<document>
<internal-code code="201">
<internal-desc>Biscuits Wrapped</internal-desc>
<top-grouping>Finished</top-grouping>
<web-category>Biscuits</web-category>
<web-sub-category>Biscuits (Wrapped)</web-sub-category>
</internal-code>
<internal-code code="202">
<internal-desc>Biscuits Sweet</internal-desc>
<top-grouping>Finished</top-grouping>
<web-category>Biscuits</web-category>
<web-sub-category>Biscuits (Sweets)</web-sub-category>
</internal-code>
<internal-code code="221">
<internal-desc>Biscuits Savoury</internal-desc>
<top-grouping>Finished</top-grouping>
<web-category>Biscuits</web-category>
<web-sub-category>Biscuits For Cheese</web-sub-category>
</internal-code>
....
</document>
I have loaded it into a tree using this code:
try:
groups = etree.parse(PRODUCT_GROUPS_XML_FILEPATH)
root = groups.getroot()
internalGroup = root.findall("./internal-code")
LOG.append("[INFO] product groupings file loaded and parsed ok")
except Exception as e:
LOG.append("[ERROR] PRODUCT GROUPINGS XML FILE ACCESS PROBLEM")
LOG.append("[***TERMINATED***]")
writelog()
exit()
I would like to use XPath to find the correct and then be able to access the child nodes of that group. So if I am searching for internal-code 221 and want web-category I would do something like:
internalGroup.find("internal-code", 221).get("web-category").text
I am not experienced with XML and Python and I have been staring at this for ages. All help very gratefully received. Thanks
Upvotes: 7
Views: 12960
Reputation: 28666
While I'm a big fan of lxml (see falsetru's answer), which you would need for full xpath support, the standard library's elementtree implementation does support enough to get what you need:
root.findtext('.//internal-code[@code="221]/web-category')
This returns the text
property of the first matching element, which is enough if you are sure that code 221 will only occur once. If there could be more and you need a list:
[i.text for i in root.findall('.//internal-code[@code="221"]/web-category')]
(note that these examples would also work in lxml)
Upvotes: 2
Reputation: 369074
According to xml.etree.ElementTree
documentation:
XPath support
This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.
Use lxml
:
>>> import lxml.etree as ET
>>>
>>> s = '''
... <document>
... <internal-code code="201">
... <internal-desc>Biscuits Wrapped</internal-desc>
... <top-grouping>Finished</top-grouping>
... <web-category>Biscuits</web-category>
... <web-sub-category>Biscuits (Wrapped)</web-sub-category>
... </internal-code>
... <internal-code code="202">
... <internal-desc>Biscuits Sweet</internal-desc>
... <top-grouping>Finished</top-grouping>
... <web-category>Biscuits</web-category>
... <web-sub-category>Biscuits (Sweets)</web-sub-category>
... </internal-code>
... <internal-code code="221">
... <internal-desc>Biscuits Savoury</internal-desc>
... <top-grouping>Finished</top-grouping>
... <web-category>Biscuits</web-category>
... <web-sub-category>Biscuits For Cheese</web-sub-category>
... </internal-code>
... </document>
... '''
>>>
>>> root = ET.fromstring(s)
>>> for text in root.xpath('.//internal-code[@code="221"]/web-category/text()'):
... print(text)
...
Biscuits
Upvotes: 2