Jijo
Jijo

Reputation: 279

signal only works in main thread

i am new to django. I am trying to run my scrapy spider through django view. My scrapy code works perfectly when i run through command prompt. but when I try to run it on django it fails. The error message: signal only works in main thread.

my code in the django view(The following)

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.crawler import CrawlerProcess
from scrapy import log, signals
from Working.spiders.workSpider import WorkSpider
from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings

spider = WorkSpider(domain='scrapinghub.com')
crawler = CrawlerProcess(Settings())
crawler.start()
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()

Please help me solve this. thank you

Upvotes: 27

Views: 25905

Answers (2)

Alon Barad
Alon Barad

Reputation: 1981

I found a way to crawl without using signals

def crawl(spider: Type[Spider], spider_kwargs: dict = None):
    spider_kwargs = {} if spider_kwargs is None else spider_kwargs
    crawler = CrawlerProcess()
    crawler.start()
    crawler.crawl(spider, **spider_kwargs)
    crawler.start(stop_after_crawl=True, install_signal_handlers=False)

Usage

from scrapy import Spider

if __name__ == "__main__":

    class BaseSpider(Spider):
        name: str


    crawl(BaseSpider, { "name": "base_spider" })

Upvotes: 3

Tigrou
Tigrou

Reputation: 201

the error basically say that you are not in a main thread so signal is not handled.

switching from CrawlerProcess to CrawlerRunner solved the problem for me ( i guess in CrawlerRunner you are in the main thread ) http://doc.scrapy.org/en/latest/topics/api.html#scrapy.crawler.CrawlerRunner

hope this helps you

Upvotes: 7

Related Questions