lxml xpath How to deal with xml entities

Question

I use lxml (Python 3.7.1) to parse an xml document containing xml entities. I can't manage to get the right syntax to query an element containing xml entities (", ', etc.).
See the following code:

from lxml import etree

root = etree.XML('''

    
        
    
    
        
    

''')

item = root.xpath(".//item[@name='abcd']") # 1
# item = root.xpath(".//item[@name='hi'jk']") # 2
# item = root.xpath(".//item[@name='hi'jk']") # 3
# item = root.xpath('.//item[@name="hi''jk"]') # 4
if len(item) != 0:
    print(len(item))
    print(item)
    name = item[0].xpath(".//@name")
    print(name)
else:
    print("Nothing")

When line 1 is uncommented, the code works fine.

When line 2 (or 3, or 4) is uncommented (and other lines are commented), nothing is found.

Why is this the case?

Thanks.

willeM_ Van Onsem · Accepted Answer

Here ' is part of the encoding in an XML file.

In the XPath query, you should use:

>>> root.xpath(""".//item[@name="hi'jk"]""")
[]

lxml xpath How to deal with xml entities

Answers (2)

Related Questions