scott huang
scott huang

Reputation: 2678

how to force scrapy exit when there is an exception

I wrote a crawler with Scrapy.

There is a function in the pipeline where I write my data to a database. I use the logging module to log runtime logs.

I found that when my string have Chinese logging.error() will throw an exception. But the crawler keeps running!

I know this is a minor error but if there is a critical exception I will miss it if crawler keeps running.

My question is: Is there a setting that I can force Scrapy stop when there is an exception?

Upvotes: 7

Views: 4713

Answers (3)

parik
parik

Reputation: 2415

You can use CLOSESPIDER_ERRORCOUNT

An integer which specifies the maximum number of errors to receive before closing the spider. If the spider generates more than that number of errors, it will be closed with the reason closespider_errorcount. If zero (or non set), spiders won’t be closed by number of errors.

By default it is set to 0 CLOSESPIDER_ERRORCOUNT = 0 you can change it to 1 if you want to exit when you have the first error.

UPDATE

Read the answers of this question, you can also use:

crawler.engine.close_spider(self, 'log message')

for more information read :

Close spider extension

Upvotes: 10

Adrien Blanquer
Adrien Blanquer

Reputation: 2061

In the process_item function of your spider you have an instance of spider.

To solve your problem you could catch the exceptions when you insert your data, then neatly stop you spider if you catch a certain exeption like this:

 def process_item(self, item, spider):
    try:
        #Insert your item here
    except YourExceptionName:
        spider.crawler.engine.close_spider(self, reason='finished')

Upvotes: 3

paul trmbrth
paul trmbrth

Reputation: 20748

I don't know of a setting that would close the crawler on any exception, but you have at least a couple of options:

  • you can raise CloseSpider exception in a spider callback, maybe when you catch that exception you mention
  • you can call crawler.engine.close_spider(spider, 'some reason') if you have a reference to the crawler and spider object, for example in an extension. See how the CloseSpider extension is implemented (it's not the same as the CloseSpider exception). You could hook this with the spider_error signal for example.

Upvotes: 1

Related Questions