Reputation: 141
What I need:
I try this:
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from time import sleep
while True:
process = CrawlerProcess(get_project_settings())
process.crawl('spider_name')
process.start()
sleep(60)
But get error:
twisted.internet.error.ReactorNotRestartable
please help me do it right
Python 3.6
Scrapy 1.3.2
Linux
Upvotes: 9
Views: 4390
Reputation: 768
In order to avoid ReactorNotRestartable error, you can try to create a main.py file from where to call several times to the crawler from shell using subprocesses
.
This main.py file could be like this:
from time import sleep
import subprocess
timeout = 60
while True:
command = 'scrapy crawl yourSpiderName'
subprocess.run(command, shell=True)
sleep(timeout)
Upvotes: 6
Reputation: 141
I think I found the solution:
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerRunner
from twisted.internet import reactor
from twisted.internet import task
timeout = 60
def run_spider():
l.stop()
runner = CrawlerRunner(get_project_settings())
d = runner.crawl('spider_name')
d.addBoth(lambda _: l.start(timeout, False))
l = task.LoopingCall(run_spider)
l.start(timeout)
reactor.run()
Upvotes: 5