Reputation: 117
I want to identify the "next-page-link" with and for scrapy of a multi page website. I have the feeling that I cannot do it the common way as the href-content is empty (href=""). See here:
<div class="publicusers-page-navigation page-navigation">
<a href="" class="current" data-page-index="1">1</a>
<a href="" data-page-index="2">2</a><a href="" data-page-index="3">3</a>
<i>...</i>
<a href="" data-page-index="330">330</a>
<a href="" class="pagination-next" data-page-index="2">►</a>
</div>
I tried
response.css('div.page-navigation > a::attr(href)').extract_first()
but it's not working.
I´d appreciate if someone could help me as I´m struggeling with this problem already for a while.
Upvotes: 1
Views: 122
Reputation: 1620
You can simply generate the urls, then parse.
page = 0
for i in range(330):
page+=1
url = ('https://www.vdma.org/mitglieder'
'?p_p_lifecycle=2&p_p_resource_id=getPage&p_p_id'
'=vdma2publicusers_WAR_vdma2publicusers&s=&page='+str(page))
print(url)
Upvotes: 1