Vasim
Vasim

Reputation: 257

Rule for pagination in Scrapy

i am using Scrapy, to extract information from website. This is the spider code (partial):

class bsSpider(CrawlSpider):
    name = "bsSpider"
    def __init__(self, *args, **kwargs): 
        super(bsSpider, self).__init__(*args, **kwargs) 
        self.start_urls = [kwargs.get('start_url')]

    rules = (Rule (LinkExtractor(allow=('.*\?id1=.*',),restrict_xpaths=('//a[@class="prevNext next"]',)), callback="parse_items", follow= True),)

Based on the above rule, it follows next pages. Now, if user wants to provide another start_url to scrape, how to update the above rule dynamically? Any kind of help will be appreciated.

Upvotes: 0

Views: 811

Answers (1)

kev
kev

Reputation: 161614

Take a look at the constructor of CrawlSpider:

class CrawlSpider(Spider):

    rules = ()

    def __init__(self, *a, **kw):
        super(CrawlSpider, self).__init__(*a, **kw)
        self._compile_rules()

As you can see, if you change self.rules somewhere, you need to call self._compile_rules() manually to recompile rules.

Upvotes: 1

Related Questions