Reputation: 1587
Using Python lxml library, I'm trying to parse a XML document as follows:
<ns:searchByScientificNameResponse xmlns:ns="http://itis_service.itis.usgs.gov">
<ns:return xmlns:ax21="http://data.itis_service.itis.usgs.gov/xsd" xmlns:ax23="http://metadata.itis_service.itis.usgs.gov/xsd" xmlns:ax26="http://itis_service.itis.usgs.gov/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ax21:SvcScientificNameList">
<ax21:scientificNames xsi:type="ax21:SvcScientificName">
<ax21:tsn>26339</ax21:tsn>
<ax21:author>L.</ax21:author>
<ax21:combinedName>Vicia faba</ax21:combinedName>
<ax21:kingdom>Plantae</ax21:kingdom>
<ax21:unitInd1 xsi:nil="true" />
<ax21:unitInd2 xsi:nil="true" />
<ax21:unitInd3 xsi:nil="true" />
<ax21:unitInd4 xsi:nil="true" />
<ax21:unitName1>Vicia</ax21:unitName1>
<ax21:unitName2>faba</ax21:unitName2>
<ax21:unitName3 xsi:nil="true" />
<ax21:unitName4 xsi:nil="true" />
</ax21:scientificNames>
</ns:return>
</ns:searchByScientificNameResponse>
Specifically, I want to get the value of the "ax21:tsn" element (in this case, the integer 26339).
I tried the answers from here and here, without success. Here is my code:
import lxml.etree as ET
tree = ET.parse("sample.xml")
#print(ET.tostring(tree))
namespaces = {'ax21': 'http://data.itis_service.itis.usgs.gov/xsd'}
tsn = tree.find('scientificNames/tsn', namespaces)
print(tsn)
It just returns nothing. It there a really intelligent way of doing this using xpath?
Upvotes: 1
Views: 150
Reputation: 51052
Two problems:
scientificNames
is not a direct child of the root element; it is a grandchild.
You need to use the ax21
prefix in the XPath expression.
The following works:
tsn = tree.find('.//ax21:scientificNames/ax21:tsn', namespaces)
Or simply:
tsn = tree.find('.//ax21:tsn', namespaces)
Upvotes: 2