Reputation: 1023
I got stuck trying to work out a solution...
My Scrapy spider crawls a site and gets some data into item then returns a request based on the data scraped and goes to crawl some other site in order to complete the item.
What it happens is that sometimes the second URL can return errors so item does not get outputted and as well.
How can I carry the item to the errback function?
Thanks in advance.
Upvotes: 1
Views: 1926
Reputation: 59574
From the docs:
errback (callable) – a function that will be called if any exception was raised while processing the request. This includes pages that failed with 404 HTTP errors and such. It receives a Twisted Failure instance as first parameter.
Try to use lambda:
...
yield Request(..., errback=lambda failure, item=item: self.on_error(failure, item))
def on_error(self, failure, item):
...
Upvotes: 4