Reputation: 15
I have this code and when both spiders finished program is still running.
#!C:\Python27\python.exe
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from carrefour.spiders.tesco import TescoSpider
from carrefour.spiders.carr import CarrSpider
from scrapy.utils.project import get_project_settings
import threading
import time
def tescofcn():
tescoSpider = TescoSpider()
settings = get_project_settings()
crawler = Crawler(settings)
crawler.configure()
crawler.crawl(tescoSpider)
crawler.start()
def carrfcn():
carrSpider = CarrSpider()
settings = get_project_settings()
crawler = Crawler(settings)
crawler.configure()
crawler.crawl(carrSpider)
crawler.start()
t1=threading.Thread(target=tescofcn)
t2=threading.Thread(target=carrfcn)
t1.start()
t2.start()
log.start()
reactor.run()
When i tried insert this to both function
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
, the spider which was faster end reactor for both spiders and the slower spider was terminated although he not finished.
Upvotes: 1
Views: 620
Reputation: 1846
What you could do is create a function that checks the list of running of spiders and connect that to singals.spider_closed
.
from scrapy.utils.trackref import iter_all
def close_reactor_if_no_spiders():
running_spiders = [spider for spider in iter_all('Spider')]
if not running_spiders:
reactor.stop()
crawler.signals.connect(close_reactor_if_no_spiders, signal=signals.spider_closed)
Although, I still would recommend using scrapyd
to manage running multiple spiders.
Upvotes: 1