Reputation: 603
I am trying to query with XPath an html document parsed with lxml. The document is a straight html-only download of the page about Plastic in Wikipedia. Then I parse it with lxml disabling entity substitution to avoid an error with '®'
from lxml import etree
root = etree.parse("plastic.html",etree.XMLParser(resolve_entities=False))
Then, I retrieve the namespace url
htmltag = root.iter().next()
nsurl = htmltag.nsmap.values()[0]
Now, I would like to use xpath queries on either 'root' or 'htmltag', but I am unable to do so. I have tried different ways, but the following seems to me the most correct form, which yields errors anyway.
root.xpath('//ns:body',namespace={'ns',nsurl})
And this is what I get
XPathResultError: Unknown return type: dict
I am running the commands in an IPython console, but I don't think that might be the problem. What am I doing wrong?
Upvotes: 11
Views: 3530
Reputation: 485
This is a simple miss spell. You should use namespaces
instead of namespace
.
Upvotes: 37