sojowok
sojowok

Reputation: 141

How restart Scrapy spider

What I need:

  1. start crawler
  2. crawler job done
  3. wait 1 minute
  4. start crawler again

I try this:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from time import sleep

while True:
    process = CrawlerProcess(get_project_settings())
    process.crawl('spider_name')
    process.start()
    sleep(60)

But get error:

twisted.internet.error.ReactorNotRestartable

please help me do it right

Python 3.6
Scrapy 1.3.2
Linux

Upvotes: 9

Views: 4390

Answers (2)

Haritz Laboa
Haritz Laboa

Reputation: 768

In order to avoid ReactorNotRestartable error, you can try to create a main.py file from where to call several times to the crawler from shell using subprocesses.

This main.py file could be like this:

from time import sleep
import subprocess

timeout = 60

while True:
    command = 'scrapy crawl yourSpiderName'
    subprocess.run(command, shell=True)
    sleep(timeout)

Upvotes: 6

sojowok
sojowok

Reputation: 141

I think I found the solution:

from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerRunner
from twisted.internet import reactor
from twisted.internet import task


timeout = 60


def run_spider():
    l.stop()
    runner = CrawlerRunner(get_project_settings())
    d = runner.crawl('spider_name')
    d.addBoth(lambda _: l.start(timeout, False))


l = task.LoopingCall(run_spider)
l.start(timeout)

reactor.run()

Upvotes: 5

Related Questions