rugantio
rugantio

Reputation: 51

Scrapy - change settings at runtime based on attribute provided

I'm having fun with scrapy, working on this project, a spider for facebook's posts.

I would like to change the CONCURRENT_REQUESTS parameter in settings.py at runtime, if a boolean attribute is provided

I tried overwriting the from_crawler method as follows, but it seems not to work

@classmethod
def from_crawler(cls, crawler, **kwargs):
    settings = cls(crawler.settings)
    if 'conc' in kwargs:
        settings.set('CONCURRENT_REQUESTS',32)
    return settings

Can you please show me how to it properly, and also how to change the __init__. Should I move all that the attribute parsing in from_crawler? Thx!

Upvotes: 3

Views: 1491

Answers (3)

ankostis
ankostis

Reputation: 9473

Based on this informative issue#4196 combined with the telnet console it is possible to do it, even post-execution.

Attach a telnet client to the port (e.g. 1234) & password logged when scrapy crawl command is launched, and issue the following interactive Python statements to modify the currently running downloader:

$ telnet  127.0.0.1  6023  # Read the actual port from logs.
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Username: scrapy
Password: <copy-from-logs>
>>> engine.downloader.total_concurrency
8
>>> engine.downloader.total_concurrency = 32
>>> est()
Execution engine status

time()-engine.start_time                        : 14226.62803554535
engine.has_capacity()                           : False
len(engine.downloader.active)                   : 28
engine.scraper.is_idle()                        : False
engine.spider.name                              : <foo>
engine.spider_is_idle(engine.spider)            : False
engine.slot.closing                             : False
len(engine.slot.inprogress)                     : 32
len(engine.slot.scheduler.dqs or [])            : 531
len(engine.slot.scheduler.mqs)                  : 0
len(engine.scraper.slot.queue)                  : 0
len(engine.scraper.slot.active)                 : 0
engine.scraper.slot.active_size                 : 0
engine.scraper.slot.itemproc_size               : 0
engine.scraper.slot.needs_backout()             : False

The same interactive statements above can be written as code in Crawler.parse() method.

Upvotes: 1

Georgiy
Georgiy

Reputation: 3561

CONCURRENT_REQUESTS setting used in scrapy.core.downloader.total_concurrency.
Settings itself are immutable. But scrapy.core.downloader object is mutable.

You can dynamically change this value from spider methods.

class FacebookSpider(scrapy.Spider):
.......    
    def __init__(self, *args, **kwargs):
        if 'conc' in kwargs:
            self.crawler.engine.downloader.total_concurrency = 32

....

Upvotes: 1

rugantio
rugantio

Reputation: 51

I just noticed that since I can just use the "-s CONCURRENT_REQUESTS=32" at runtime. Another option would have been to overwrite the update_settings method, here's a reference for anybody who runs into this issue: Update scrapy settings based on spider property

Upvotes: 0

Related Questions