Moataz Ghazy
Moataz Ghazy

Reputation: 239

Is there a way to restart a scrapy crawler?

I was wondering if there is a way to restart a scrapy crawler. This is what my code looks like:

from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.crawler import CrawlerProcess

results = set([])

class SitemapCrawler(CrawlSpider):

name = "Crawler"
start_urls = ['www.example.com']
allowed_domains = ['www.example.com']
rules = [Rule(LinkExtractor(), callback='parse_links', follow=True)]

def parse_links(self, response):
    href = response.xpath('//a/@href').getall()
    results.add(response.url)
    for link in href:
        results.add(link)

def start():
   process.crawl(Crawler)
   process.start()
   for link in results:
      print(link)

If I try calling start() twice it runs it once than gives me this error:

raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable

I know this is a general question, so I don't expect any code but I just want to know how I can fix this issue. Thanks in advance.

Upvotes: 1

Views: 842

Answers (1)

Mohammadtaher Abbasi
Mohammadtaher Abbasi

Reputation: 96

from twisted.internet import reactor
import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
class MySpider(scrapy.Spider):
        #Spider definition
        configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
        runner = CrawlerRunner()
        d = runner.crawl(MySpider)
        def finished():            
            print("finished :D") 
        d.addCallback(finished)
        reactor.run() 

Upvotes: 1

Related Questions