buzz
buzz

Reputation: 356

How can I determine whether Scrapy encountered errors, in the Pipeline.close_spider() method?

I have a Scrapy spider and Pipeline setup.

My Spider extracts data from a website and my Pipeline's process_item()method inserts the extracted data into a temporary database table.

At the end, in the Pipeline's close_spider() method I run some error checks on the temporary database table and if things look okay, then I make the temporary table permanent.

However, if Scrapy encounters exceptions before the Pipeline's close_spider() method is called, then its possible that the extracted data is incomplete.

Is there a way to check whether Scrapy has encountered exceptions in the Pipeline's close_spider() method? In case there are errors (indicating that the extracted data may be incomplete) I do not want to make the temporary table permanent.

I am using the CloseSpider extension with CLOSESPIDER_ERRORCOUNT set to 1 to close the Spider on the first error. However, I haven't figured out how to distinguish between a normal close and an error close in the Pipeline's close_spider() method.

Upvotes: 1

Views: 214

Answers (1)

buzz
buzz

Reputation: 356

I was able to do this using signals in Scrapy. I'm posting the answer here in case someone else runs into this.

I registered to catch the spider_error signal and provided a callback handler in the spider itself.

The callback set a flag on the spider to indicate that it had encountered an error.

In the pipeline's close_spider() method, I checked whether the error flag was set on the spider to distinguish between a normal close and an error close.

Upvotes: 1

Related Questions