Reputation: 23
Sometimes I get fail of all scrapers when in one scraper exist some error. Example: I have scrapers with syntax error which was missed.
class MySpiderWithSyntaxError(scrapy.Spider):
name = "my_spider_with_syntax_error"
start_urls = [
'http://www.website.com'
]
def parse(self response):
for url in response.css('a.p::attr(href)').extract():
print url
In this spider missed comma in line
def parse(self response):
And spider MySpiderWithSyntaxError will fail. But if run another spider without syntax error (spider code below)
class MySpiderWithoutSyntaxError(scrapy.Spider):
name = "my_spider_without_syntax_error"
start_urls = [
'http://www.website.com'
]
def parse(self, response):
for url in response.css('a.p::attr(href)').extract():
print url
I get error like that:
Traceback (most recent call last):
File "/home/Documents/project/.env/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 141, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/crawler.py", line 238, in __init__
super(CrawlerProcess, self).__init__(settings)
File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/crawler.py", line 129, in __init__
self.spider_loader = _get_spider_loader(settings)
File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/crawler.py", line 325, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 33, in from_settings
return cls(settings)
File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 20, in __init__
self._load_all_spiders()
File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 28, in _load_all_spiders
for module in walk_modules(name):
File "/home/Documents/project/.env/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/home/Documents/project/scrapers/scrapy/spiders/my_spider_with_syntax_error.py", line 14
def parse(self response):
^
SyntaxError: invalid syntax
Question: Is it possible to catch errors like that and fail only spider with syntax error but another spiders work fine?
Upvotes: 2
Views: 394
Reputation: 10210
If you use a Scrapy project, then even though you run a single spider (using scrapy crawl <spidername>
), all the spider modules are loaded. Hence, if any of them contains a syntax error, you get an error.
Upvotes: 1