user2728494
user2728494

Reputation: 131

Scrapy follow pagination in second level callback

I have this code and i need to follow all pagination links in my function parse_with_additional_info

start_urls = ['http://example.com']

def parse_start_url(self, response):
    sel = Selector(response)
    aa = sel.xpath('//h3/a...../@href').extract()
    for a in aa:
        yield Request(url = a, callback=self.parse_additional_info)

def parse_additional_info(self, response):
    sel = Selector(response)
    nextPageLinks=sel.xpath("//a[text([contains(.,'Next')]]/@href").extract()

Please note: I already tried scrapy rules and that did not work since its a chain of callbacks.

Upvotes: 1

Views: 412

Answers (1)

user2728494
user2728494

Reputation: 131

I found the answer by myself. I had to use urljoin method of response object with nextPageLinks url and callback the same function until no pages left. Here is the code it may help some one with same scenario.

def parse_additional_info(self, response):
 .
 .

if nextPageLinks: 
   url = response.urljoin(nextPageLinks[0]) 
   yield Request(url = url, callback=self.parse_additional_info) 

Upvotes: 1

Related Questions