Reputation: 51
I'm having fun with scrapy, working on this project, a spider for facebook's posts.
I would like to change the CONCURRENT_REQUESTS parameter in settings.py at runtime, if a boolean attribute is provided
I tried overwriting the from_crawler method as follows, but it seems not to work
@classmethod
def from_crawler(cls, crawler, **kwargs):
settings = cls(crawler.settings)
if 'conc' in kwargs:
settings.set('CONCURRENT_REQUESTS',32)
return settings
Can you please show me how to it properly, and also how to change the __init__. Should I move all that the attribute parsing in from_crawler? Thx!
Upvotes: 3
Views: 1491
Reputation: 9473
Based on this informative issue#4196 combined with the telnet console it is possible to do it, even post-execution.
Attach a telnet client to the port (e.g. 1234
) & password logged when scrapy crawl
command is launched, and issue the following interactive Python statements to modify the currently running downloader
:
$ telnet 127.0.0.1 6023 # Read the actual port from logs.
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Username: scrapy
Password: <copy-from-logs>
>>> engine.downloader.total_concurrency
8
>>> engine.downloader.total_concurrency = 32
>>> est()
Execution engine status
time()-engine.start_time : 14226.62803554535
engine.has_capacity() : False
len(engine.downloader.active) : 28
engine.scraper.is_idle() : False
engine.spider.name : <foo>
engine.spider_is_idle(engine.spider) : False
engine.slot.closing : False
len(engine.slot.inprogress) : 32
len(engine.slot.scheduler.dqs or []) : 531
len(engine.slot.scheduler.mqs) : 0
len(engine.scraper.slot.queue) : 0
len(engine.scraper.slot.active) : 0
engine.scraper.slot.active_size : 0
engine.scraper.slot.itemproc_size : 0
engine.scraper.slot.needs_backout() : False
The same interactive statements above can be written as code in Crawler.parse()
method.
Upvotes: 1
Reputation: 3561
CONCURRENT_REQUESTS
setting used in scrapy.core.downloader.total_concurrency.
Settings itself are immutable. But scrapy.core.downloader
object is mutable.
You can dynamically change this value from spider methods.
class FacebookSpider(scrapy.Spider):
.......
def __init__(self, *args, **kwargs):
if 'conc' in kwargs:
self.crawler.engine.downloader.total_concurrency = 32
....
Upvotes: 1
Reputation: 51
I just noticed that since I can just use the "-s CONCURRENT_REQUESTS=32" at runtime. Another option would have been to overwrite the update_settings method, here's a reference for anybody who runs into this issue: Update scrapy settings based on spider property
Upvotes: 0