Jack Fleeting
Jack Fleeting

Reputation: 24928

Problems with '._ElementUnicodeResult'

While trying to help another user out with some question, I ran into the following problem myself:

The object is to find the country of origin of a list of wines on the page. So we start with:

import requests
from lxml import etree

url = "https://www.winepeople.com.au/wines/Dry-Red/_/N-1z13zte"
res = requests.get(url)
content = res.content
res = requests.get(url)

tree = etree.fromstring(content, parser=etree.HTMLParser())
tree_struct = etree.ElementTree(tree)

Next, for reasons I'll get into in a separate question, I'm trying to compare the xpath of two elements with certain attributes. So:

wine = tree.xpath("//div[contains(@class, 'row wine-attributes')]")
country = tree.xpath("//div/text()[contains(., 'Australia')]")

So far, so good. What are we dealing with here?

type(wine),type(country)
>> (list, list)

They are both lists. Let's check the type of the first element in each list:

type(wine[0]),type(country[0])
>> (lxml.etree._Element, lxml.etree._ElementUnicodeResult)

And this is where the problem starts. Because, as mentioned, I need to find the xpath of the first elements of the wine and country lists. And when I run:

tree_struct.getpath(wine[0])

The output is, as expected:

'/html/body/div[13]/div/div/div[2]/div[6]/div[1]/div/div/div[2]/div[2]'

But with the other:

tree_struct.getpath(country[0])

The output is:

TypeError: Argument 'element' has incorrect type (expected 
       lxml.etree._Element, got lxml.etree._ElementUnicodeResult)

I couldn't find much information about _ElementUnicodeResult), so what is it? And, more importantly, how do I fix the code so that I get an xpath for that node?

Upvotes: 1

Views: 935

Answers (1)

Daniel Haley
Daniel Haley

Reputation: 52868

You're selecting a text() node instead of an element node. This is why you end up with a lxml.etree._ElementUnicodeResult type instead of a lxml.etree._Element type.

Try changing your xpath to the following in order to select the div element instead of the text() child node of div...

country = tree.xpath("//div[contains(., 'Australia')]")

Upvotes: 3

Related Questions