Mansur
Mansur

Reputation: 1829

lxml xpath() function does not work with correct XPath query

I am trying to evaluate some XPath queries using lxml library, however, it seems like it does not work, for some reason. Here's the code

if __name__ == '__main__':
    xml = r'''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<unit xmlns="http://www.srcML.org/srcML/src" revision="0.9.5" language="Java" filename="File.java"><package>package <name><name>com</name><operator>.</operator><name>samples</name><operator>.</operator><name>e978092668</name></name>;</package>
<class><annotation>@<name>Path</name></annotation>
<specifier>public</specifier> class <name>Correct</name> <block>{
    <decl_stmt><decl><annotation>@<name>Inject</name></annotation>
    <specifier>private</specifier> <type><name>JsonWebToken</name></type> <name>field</name></decl>;</decl_stmt>
}</block></class>
</unit>'''.encode("UTF-8")

    xpath = '''unit/class[((descendant-or-self::decl_stmt/decl[(type[name[text()='JsonWebToken']] and annotation[name[text()='Inject']])]) and (annotation[name[text()='Path']]))]'''
    tree = etree.fromstring(xml)
    a = tree.xpath(xpath)
    print(len(a)) # returns 0 (matches)

I tried the exact same xpath query with the exact same XML string on freeformatter.com and it works and shows the match. I don't know what's wrong with my own code, because, for the most part, I followed the official tutorial on the website.

Edit 1:

Trying with namespaces.

    xpath = '''src:unit/src:class[((descendant-or-self::src:decl_stmt/src:decl[(src:type[src:name[text()='JsonWebToken']] and src:annotation[src:name[text()='Inject']])]) and (src:annotation[src:name[text()='Path']]))]'''
    tree = etree.fromstring(xml)
    a = tree.xpath(xpath, namespaces={
        "src": "http://www.srcML.org/srcML/src"
    })
    print(len(a)) # returns 0 (matches)

Thanks!

Upvotes: 1

Views: 395

Answers (1)

Daniel Haley
Daniel Haley

Reputation: 52888

The problem is that when you do:

tree = etree.fromstring(xml)

tree has the context src:unit so your xpath is looking for a child src:unit in src:unit. (If you print(tree.tag) you'll see {http://www.srcML.org/srcML/src}unit.)

Try starting the xpath at src:class...

xpath = '''src:class[((descendant-or-self::src:decl_stmt/src:decl[(src:type[src:name[text()='JsonWebToken']] and src:annotation[src:name[text()='Inject']])]) and (src:annotation[src:name[text()='Path']]))]'''

Upvotes: 2

Related Questions