Scrapy - Spiders taking too long to being shut down

Question

Basically, I have a file named spiders.py in which I configure all my spiders and fire then all, using a single crawler. This is the source code of this file:

from scrapy import spiderloader
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from navigator import *


def main():
  settings = get_project_settings()
  spider_loader = spiderloader.SpiderLoader.from_settings(settings)
  process = CrawlerProcess(settings=settings)
  for spider_name in spider_loader.list():
      process.crawl(spider_name)

  process.start()


if __name__ == '__main__':
  main()

What I'm trying to achieve is to fire this spiders from another script, using subprocess module, and after 5 minutes of execution, turning down all spiders (using only one SIGTERM). The file responsible for this objective is monitor.py:

from time import sleep
import os
import signal
import subprocess

def main():
  spiders_process = subprocess.Popen(["python", "spiders.py"], stdout=subprocess.PIPE,
                                      shell=False, preexec_fn=os.setsid)
  sleep(300)
  os.killpg(spiders_process.pid, signal.SIGTERM)

if __name__ == '__main__':
  main()

When the main thread wake up, the terminal says 2018-07-19 21:45:09 [scrapy.crawler] INFO: Received SIGTERM, shutting down gracefully. Send again to force. But even after this message, the spiders continue to scrap the web pages. What I doing wrong?

OBS : It is possible to fire all spiders inside spiders.py without blocking the main process?

Scrapy - Spiders taking too long to being shut down

Answers (1)

Related Questions