user2571599
user2571599

Reputation: 13

Can't override settings in Scrapy after the init function of a spider

Is it possible to override Scrapy settings after the init function of a spider? For example if I want to get settings from db and I pass my query parameters as arguments from the cmdline.

def __init__(self, spider_id, **kwargs):
    self.spider_id = spider_id
    self.set_params(spider_id)
    super(Base_Crawler, self).__init__(**kwargs)

def set_params(self):
   #TODO
   #makes a query in db
   #get set variables from query result
   #override settings

Upvotes: 0

Views: 1988

Answers (1)

mizhgun
mizhgun

Reputation: 1888

Technically you can "override" settings after initialization of spider however it would affect nothing because most of them applied earlier.

What you can actually do is to pass parameters to Spider as command-line options using -a and override project settings using -s, for ex.)

Spider:

class TheSpider(scrapy.Spider):
    name = 'thespider'

    def __init__(self, *args, **kwargs):
       self.spider_id = kwargs.pop('spider_id', None)
       super(TheSpider).__init__(*args, **kwargs)

CLI:

scrapy crawl thespider -a spider_id=XXX -s SETTTING_TO_OVERRIDE=YYY

If you need something more advanced consider to write custom runner wrapping your spider. Below is example from the docs:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())

# 'followall' is the name of one of the spiders of the project.
process.crawl('followall', domain='scrapinghub.com')
process.start() # the script will block here until the crawling is finished

Just replace get_project_settings with your own routine that returns Settings instance.

Anyway, avoid of overloading of spider's code with non-scraping logic to keep it clean and reusable.

Upvotes: 1

Related Questions