Reputation: 257
i am using Scrapy, to extract information from website. This is the spider code (partial):
class bsSpider(CrawlSpider):
name = "bsSpider"
def __init__(self, *args, **kwargs):
super(bsSpider, self).__init__(*args, **kwargs)
self.start_urls = [kwargs.get('start_url')]
rules = (Rule (LinkExtractor(allow=('.*\?id1=.*',),restrict_xpaths=('//a[@class="prevNext next"]',)), callback="parse_items", follow= True),)
Based on the above rule, it follows next pages. Now, if user wants to provide another start_url to scrape, how to update the above rule dynamically? Any kind of help will be appreciated.
Upvotes: 0
Views: 811
Reputation: 161614
Take a look at the constructor of CrawlSpider
:
class CrawlSpider(Spider):
rules = ()
def __init__(self, *a, **kw):
super(CrawlSpider, self).__init__(*a, **kw)
self._compile_rules()
As you can see, if you change self.rules
somewhere, you need to call self._compile_rules()
manually to recompile rules.
Upvotes: 1