user3499866
user3499866

Reputation: 15

How to stop reactor when both spiders finished

I have this code and when both spiders finished program is still running.

#!C:\Python27\python.exe

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from carrefour.spiders.tesco import TescoSpider
from carrefour.spiders.carr import CarrSpider
from scrapy.utils.project import get_project_settings
import threading
import time

def tescofcn():
    tescoSpider = TescoSpider()
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.configure()
    crawler.crawl(tescoSpider)
    crawler.start()

def carrfcn():
    carrSpider = CarrSpider()
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.configure()
    crawler.crawl(carrSpider)
    crawler.start()


t1=threading.Thread(target=tescofcn)
t2=threading.Thread(target=carrfcn)

t1.start()
t2.start()
log.start()
reactor.run()

When i tried insert this to both function

crawler.signals.connect(reactor.stop, signal=signals.spider_closed)

, the spider which was faster end reactor for both spiders and the slower spider was terminated although he not finished.

Upvotes: 1

Views: 620

Answers (1)

marven
marven

Reputation: 1846

What you could do is create a function that checks the list of running of spiders and connect that to singals.spider_closed.

from scrapy.utils.trackref import iter_all


def close_reactor_if_no_spiders():
    running_spiders = [spider for spider in iter_all('Spider')]

    if not running_spiders:
        reactor.stop()

crawler.signals.connect(close_reactor_if_no_spiders, signal=signals.spider_closed)

Although, I still would recommend using scrapyd to manage running multiple spiders.

Upvotes: 1

Related Questions