Reputation: 279
i am new to django. I am trying to run my scrapy spider through django view. My scrapy code works perfectly when i run through command prompt. but when I try to run it on django it fails. The error message: signal only works in main thread.
my code in the django view(The following)
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.crawler import CrawlerProcess
from scrapy import log, signals
from Working.spiders.workSpider import WorkSpider
from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings
spider = WorkSpider(domain='scrapinghub.com')
crawler = CrawlerProcess(Settings())
crawler.start()
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()
Please help me solve this. thank you
Upvotes: 27
Views: 25905
Reputation: 1981
I found a way to crawl without using signals
def crawl(spider: Type[Spider], spider_kwargs: dict = None):
spider_kwargs = {} if spider_kwargs is None else spider_kwargs
crawler = CrawlerProcess()
crawler.start()
crawler.crawl(spider, **spider_kwargs)
crawler.start(stop_after_crawl=True, install_signal_handlers=False)
Usage
from scrapy import Spider
if __name__ == "__main__":
class BaseSpider(Spider):
name: str
crawl(BaseSpider, { "name": "base_spider" })
Upvotes: 3
Reputation: 201
the error basically say that you are not in a main thread so signal is not handled.
switching from CrawlerProcess to CrawlerRunner solved the problem for me ( i guess in CrawlerRunner you are in the main thread ) http://doc.scrapy.org/en/latest/topics/api.html#scrapy.crawler.CrawlerRunner
hope this helps you
Upvotes: 7