Reputation: 154
I've created an xpath expression to target an element so that I can extract a certain information out of some html elements using xpath within scrapy. I can't reach it anyway.
Html elements:
<div class="rates">
<label>
Rates :
</label>
R 3500
<br class="hidden-md hidden-lg">
</div>
I wish to extract R 3500
out of it.
I've tried with:
from scrapy import Selector
html = """
<div class="rates">
<label>
Rates :
</label>
R 3500
<br class="hidden-md hidden-lg">
</div>
"""
sel = Selector(text=html)
rate = sel.xpath("//*[@class='rates']/label/following::*").get()
print(rate)
Upon running my above script this is what I'm getting <br class="hidden-md hidden-lg">
whereas I wish to get R 3500
.
I could have used .tail
if opted for lxml
. However, when I go for scrapy I don't find anything similar.
How can I extract that rate out of the html elements using xpath?
Upvotes: 2
Views: 173
Reputation: 22617
To complement the accepted answer, which is entirely correct, here is an explanation why
//*[@class='rates']/label/following::*
given the document
<div class="rates">
<label>
Rates :
</label>
R 3500
<br class="hidden-md hidden-lg">
</div>
does not return the text R 3500
: *
only selects element nodes that follow after label
elements, but not text nodes. Elements and text nodes are different concepts in the XPath document model. You can test this claim with a slightly different document:
<div class="rates">
<label>
Rates :
</label>
<any>R 3500</any>
<br class="hidden-md hidden-lg">
</div>
Which causes your code to return the any
element.
Both text()
(more specific) and node()
(more general) select this text node, and in this case both the following::
and following-sibling::
axes work.
Upvotes: 1
Reputation: 92854
To get a text node as a following-sibling
after the label
node:
...
sel = Selector(text=html)
rate = sel.xpath("//*[@class='rates']/label/following-sibling::text()").get().strip()
print(rate)
The output:
R 3500
Addition: "//*[@class='rates']/label/following::text()"
should also work.
https://www.w3.org/TR/1999/REC-xpath-19991116#axes
Upvotes: 3