Reputation: 13
Is it possible to override Scrapy settings after the init function of a spider? For example if I want to get settings from db and I pass my query parameters as arguments from the cmdline.
def __init__(self, spider_id, **kwargs):
self.spider_id = spider_id
self.set_params(spider_id)
super(Base_Crawler, self).__init__(**kwargs)
def set_params(self):
#TODO
#makes a query in db
#get set variables from query result
#override settings
Upvotes: 0
Views: 1988
Reputation: 1888
Technically you can "override" settings after initialization of spider however it would affect nothing because most of them applied earlier.
What you can actually do is to pass parameters to Spider as command-line options using -a
and override project settings using -s
, for ex.)
Spider:
class TheSpider(scrapy.Spider):
name = 'thespider'
def __init__(self, *args, **kwargs):
self.spider_id = kwargs.pop('spider_id', None)
super(TheSpider).__init__(*args, **kwargs)
CLI:
scrapy crawl thespider -a spider_id=XXX -s SETTTING_TO_OVERRIDE=YYY
If you need something more advanced consider to write custom runner wrapping your spider. Below is example from the docs:
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
# 'followall' is the name of one of the spiders of the project.
process.crawl('followall', domain='scrapinghub.com')
process.start() # the script will block here until the crawling is finished
Just replace get_project_settings
with your own routine that returns Settings instance.
Anyway, avoid of overloading of spider's code with non-scraping logic to keep it clean and reusable.
Upvotes: 1