SVSerhii
SVSerhii

Reputation: 23

Scrapy: all scrapers failed. Spider with syntax error

Sometimes I get fail of all scrapers when in one scraper exist some error. Example: I have scrapers with syntax error which was missed.

class MySpiderWithSyntaxError(scrapy.Spider):
    name = "my_spider_with_syntax_error"

    start_urls = [
        'http://www.website.com'
    ]

    def parse(self response):
        for url in response.css('a.p::attr(href)').extract():
            print url

In this spider missed comma in line

def parse(self response):

And spider MySpiderWithSyntaxError will fail. But if run another spider without syntax error (spider code below)

class MySpiderWithoutSyntaxError(scrapy.Spider):
    name = "my_spider_without_syntax_error"

    start_urls = [
        'http://www.website.com'
    ]

    def parse(self, response):
        for url in response.css('a.p::attr(href)').extract():
            print url

I get error like that:

    Traceback (most recent call last):
    File "/home/Documents/project/.env/bin/scrapy", line 11, in <module>
       sys.exit(execute())
    File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 141, in execute
       cmd.crawler_process = CrawlerProcess(settings)
    File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/crawler.py", line 238, in __init__
       super(CrawlerProcess, self).__init__(settings)
    File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/crawler.py", line 129, in __init__
       self.spider_loader = _get_spider_loader(settings)
    File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/crawler.py", line 325, in _get_spider_loader
       return loader_cls.from_settings(settings.frozencopy())
    File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 33, in from_settings
       return cls(settings)
    File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 20, in __init__
       self._load_all_spiders()
    File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 28, in _load_all_spiders
       for module in walk_modules(name):
    File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 71, in walk_modules
       submod = import_module(fullpath)
    File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
       __import__(name)
    File "/home/Documents/project/scrapers/scrapy/spiders/my_spider_with_syntax_error.py", line 14
       def parse(self response):
                     ^
    SyntaxError: invalid syntax

Question: Is it possible to catch errors like that and fail only spider with syntax error but another spiders work fine?

Upvotes: 2

Views: 394

Answers (1)

Tom&#225;š Linhart
Tom&#225;š Linhart

Reputation: 10210

If you use a Scrapy project, then even though you run a single spider (using scrapy crawl <spidername>), all the spider modules are loaded. Hence, if any of them contains a syntax error, you get an error.

Upvotes: 1

Related Questions