Reputation: 675
This should be easy but I'm stuck.
<div class="paginationControl">
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=2&powerunit=2">Link Text 2</a> |
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=3&powerunit=2">Link Text 3</a> |
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=4&powerunit=2">Link Text 4</a> |
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=5&powerunit=2">Link Text 5</a> |
<!-- Next page link -->
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=2&powerunit=2">Link Text Next ></a>
</div>
I'm trying to use Scrapy (Basespider) to select a link based on it's Link text using:
nextPage = HtmlXPathSelector(response).select("//div[@class='paginationControl']/a/@href").re("(.+)*?Next")
For example, I want to select the next page link based on the fact that it's text is "Link Text Next". Any ideas?
Upvotes: 11
Views: 10052
Reputation: 879471
Use a[contains(text(),'Link Text Next')]
:
nextPage = HtmlXPathSelector(response).select(
"//div[@class='paginationControl']/a[contains(text(),'Link Text Next')]/@href")
Reference: Documentation on the XPath contains function
PS. Your text Link Text Next
has a space at the end. To avoid having to include that space in the code:
text()="Link Text Next "
I think using contains
is a bit more general while still being specific enough.
Upvotes: 16
Reputation: 76755
You can use the following XPath expression:
//div[@class='paginationControl']/a[text()="Link Text Next"]/@href
This selects the href
attributes of the link with text "Link Text Next"
.
See XPath string functions if you need more control.
Upvotes: 6
Reputation: 10988
Your xpath is selecting the href not the text in the a
tag. It doesn't look from your example like the href has next
in it, so you can't find it with an RE.
Upvotes: 1