Problem parsing XML document with namespaces using Python lxml

Question

Using Python lxml library, I'm trying to parse a XML document as follows:




26339
L.
Vicia faba
Plantae




Vicia
faba

Specifically, I want to get the value of the "ax21:tsn" element (in this case, the integer 26339).

I tried the answers from here and here, without success. Here is my code:

import lxml.etree as ET

tree = ET.parse("sample.xml")
#print(ET.tostring(tree))

namespaces = {'ax21': 'http://data.itis_service.itis.usgs.gov/xsd'} 
tsn = tree.find('scientificNames/tsn', namespaces)
print(tsn)

It just returns nothing. It there a really intelligent way of doing this using xpath?

mzjn · Accepted Answer

Two problems:

scientificNames is not a direct child of the root element; it is a grandchild.
You need to use the ax21 prefix in the XPath expression.

The following works:

tsn = tree.find('.//ax21:scientificNames/ax21:tsn', namespaces)

Or simply:

tsn = tree.find('.//ax21:tsn', namespaces)

Problem parsing XML document with namespaces using Python lxml

Answers (1)

Related Questions