Scrapy parse pagination without next link

Question

I'm trying to parse a pagination without next link. The html is belove:

For this html I have try this xpath:

response.xpath('//div[@class="pagination"]/ul/li/a/@href').extract()
or 
response.xpath('//div[@class="pagination"]/ul/li/a/@href/following-sibling::a[1]/@href').extract()

is there a good way to parse this pagination? Thanks for all.

PS: I have checked this answers too:

Answer 1

Answer 2

Felix Ekl&#246;f · Accepted Answer

One solution is to scrape x number of pages, but this isn't always a good solution if the total number of pages isn't constant:

class MySpider(scrapy.spider):
    num_pages = 10
    def start_requests(self):
        requests = []
        for i in range(1, self.num_pages)
            requests.append(scrapy.Request(
                url='www.demopage.com/category_product_seo_name?page={0}'.format(i)
            ))
        return requests

    def parse(self, response):
        #parse pages here.

Update

You can also keep track of the page count and do something like this. a[href~="?page=2"]::attr(href) will target a elements which href attribute contains the string specified. (I'm not currently able to test if this code works, but something in the style of this should do it)

class MySpider(scrapy.spider):
    start_urls = ['https://demopage.com/search?p=1']
    page_count = 1


def parse(self, response):
     self.page_count += 1
     #parse response

     next_url = response.css('#pagination > ul > li > a[href~="?page={0}"]::attr(href)'.format(self.page_count))
     if next_url:
         yield scrapy.Request(
             url = next_url
         )

Scrapy parse pagination without next link

Answers (2)

Related Questions