Reputation: 131
I have this code and i need to follow all pagination links in my function parse_with_additional_info
start_urls = ['http://example.com']
def parse_start_url(self, response):
sel = Selector(response)
aa = sel.xpath('//h3/a...../@href').extract()
for a in aa:
yield Request(url = a, callback=self.parse_additional_info)
def parse_additional_info(self, response):
sel = Selector(response)
nextPageLinks=sel.xpath("//a[text([contains(.,'Next')]]/@href").extract()
Please note: I already tried scrapy rules and that did not work since its a chain of callbacks.
Upvotes: 1
Views: 412
Reputation: 131
I found the answer by myself. I had to use urljoin method of response object with nextPageLinks url and callback the same function until no pages left. Here is the code it may help some one with same scenario.
def parse_additional_info(self, response):
.
.
if nextPageLinks:
url = response.urljoin(nextPageLinks[0])
yield Request(url = url, callback=self.parse_additional_info)
Upvotes: 1