Reputation: 154
I've been trying to scoop out a portion of text out of some html elements using xapth but It seems I'm going somewhere wrong that is why I can't make it.
Html elements:
htmlelem = """
<div class="content">
<p>Type of cuisine: </p>International
</div>
"""
I would like to dig out International
using xpath. I know I could get success using .next_sibling
If I wanted to extract the same using css selector
but I'm not interested in going that route.
That said If I try like this I can get the same using xpath
:
tree.xpath("//*[@class='content']/p/following::text()")[0]
But the above expression is not what I'm after cause I can't use the same within selenium webdriver If I stick to driver.find_element_by_xpath()
The only way that I'm interested in is like the following but it is not working:
"//*[@class='content']/p/following::*"
Real-life example:
from lxml.html import fromstring
htmlelem = """
<div class="content">
<p>Type of cuisine: </p>International
</div>
"""
tree = fromstring(htmlelem)
item = tree.xpath("//*[@class='content']/p/following::text()")[0].strip()
elem = tree.xpath("//*[@class='content']/p/following::*")[0].text
print(elem)
In the above example, I can get success printing item
but can't printing elem
. However, I would like to modify the expression used within elem
.
How can I make it work so that the same xpath
I can use within lxml
library or within selenium
?
Upvotes: 2
Views: 283
Reputation: 24930
Since OP was looking for a solution which extracts the text from outside the xpath, the following should do that, albeit in a somewhat awkward manner:
tree.xpath("//*[@class='content']")[0][0].tail
Output:
International
The need for this approach is a result of the way lxml parses the html code:
tree.xpath("//*[@class='content']")
results in a list
of length=1.
The first (and only) element in the list - tree.xpath("//*[@class='content']")[0]
is a lxml.html.HtmlElement
which itself can be treated as a list and also has length=1.
In the tail
of the first (and only) element in that lxml.html.HtmlElement
hides the desired output...
Upvotes: 2