Reputation:
I have several different spiders and want to run all them at once. Based on this and this, I can run multiple spiders in the same process. However, I don't know how to design a signal system to stop the reactor when all spiders are finished.
I have tried:
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
and
crawler.signals.connect(reactor.stop, signal=signals.spider_idle)
In both cases, the reactor stops when first crawler closes. Of course, I want that the reactor stops after all spiders are finished.
Could someone show me how to do the trick?
Upvotes: 5
Views: 2657
Reputation:
After a sleep night, I have realized I know how to do that. All I need is a counter:
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from scrapy.utils.project import get_project_settings
class ReactorControl:
def __init__(self):
self.crawlers_running = 0
def add_crawler(self):
self.crawlers_running += 1
def remove_crawler(self):
self.crawlers_running -= 1
if self.crawlers_running == 0 :
reactor.stop()
def setup_crawler(spider_name):
crawler = Crawler(settings)
crawler.configure()
crawler.signals.connect(reactor_control.remove_crawler, signal=signals.spider_closed)
spider = crawler.spiders.create(spider_name)
crawler.crawl(spider)
reactor_control.add_crawler()
crawler.start()
reactor_control = ReactorControl()
log.start()
settings = get_project_settings()
crawler = Crawler(settings)
for spider_name in crawler.spiders.list():
setup_crawler(spider_name)
reactor.run()
I am assuming Scrapy is not parallel.
I don't know if it is the best way to do that, but it works!
Edit: Updated. See @Jean-Robert comment.
Upvotes: 7