Reputation: 2678
I wrote a crawler with Scrapy.
There is a function in the pipeline where I write my data to a database. I use the logging module to log runtime logs.
I found that when my string have Chinese logging.error()
will throw an exception. But the crawler keeps running!
I know this is a minor error but if there is a critical exception I will miss it if crawler keeps running.
My question is: Is there a setting that I can force Scrapy
stop when there is an exception?
Upvotes: 7
Views: 4713
Reputation: 2415
You can use CLOSESPIDER_ERRORCOUNT
An integer which specifies the maximum number of errors to receive before closing the spider. If the spider generates more than that number of errors, it will be closed with the reason closespider_errorcount. If zero (or non set), spiders won’t be closed by number of errors.
By default it is set to 0
CLOSESPIDER_ERRORCOUNT = 0
you can change it to 1 if you want to exit when you have the first error.
UPDATE
Read the answers of this question, you can also use:
crawler.engine.close_spider(self, 'log message')
for more information read :
Upvotes: 10
Reputation: 2061
In the process_item function of your spider you have an instance of spider
.
To solve your problem you could catch the exceptions when you insert your data, then neatly stop you spider if you catch a certain exeption like this:
def process_item(self, item, spider):
try:
#Insert your item here
except YourExceptionName:
spider.crawler.engine.close_spider(self, reason='finished')
Upvotes: 3
Reputation: 20748
I don't know of a setting that would close the crawler on any exception, but you have at least a couple of options:
CloseSpider
exception in a spider callback, maybe when you catch that exception you mentioncrawler.engine.close_spider(spider, 'some reason')
if you have a reference to the crawler and spider object, for example in an extension. See how the CloseSpider
extension is implemented (it's not the same as the CloseSpider
exception).
You could hook this with the spider_error
signal for example.Upvotes: 1