Reputation: 356
I have a Scrapy spider and Pipeline setup.
My Spider extracts data from a website and my Pipeline's process_item()
method inserts the extracted data into a temporary database table.
At the end, in the Pipeline's close_spider()
method I run some error checks on the temporary database table and if things look okay, then I make the temporary table permanent.
However, if Scrapy encounters exceptions before the Pipeline's close_spider()
method is called, then its possible that the extracted data is incomplete.
Is there a way to check whether Scrapy has encountered exceptions in the Pipeline's close_spider()
method? In case there are errors (indicating that the extracted data may be incomplete) I do not want to make the temporary table permanent.
I am using the CloseSpider
extension with CLOSESPIDER_ERRORCOUNT
set to 1 to close the Spider on the first error. However, I haven't figured out how to distinguish between a normal close and an error close in the Pipeline's close_spider()
method.
Upvotes: 1
Views: 214
Reputation: 356
I was able to do this using signals in Scrapy. I'm posting the answer here in case someone else runs into this.
I registered to catch the spider_error
signal and provided a callback handler in the spider itself.
The callback set a flag on the spider to indicate that it had encountered an error.
In the pipeline's close_spider()
method, I checked whether the error flag was set on the spider to distinguish between a normal close and an error close.
Upvotes: 1