Reputation: 9759
In the spider I'm bulding, I'm required to login to the website to start performing requests (which is quite simple), and then I go through a loop to perform some thousand requests.
However, in this website in particular, if I do not logout, I get a 10 minute penalty before I can log in again. So I've tried to logout after the loop is done, with a lower priority, like this:
def parse_after_login(self, response):
for item in [long_list]:
yield scrapy.Request(..., callback=self.parse_result, priority=100)
# After all requests have been made, perform logout:
yield scrapy.Request('/logout/', callback=self.parse_logout, priority=0)
However, there is no guarantee that the logout request won't be ready before the other requests are done processing, so a premature logout will invalidate the other requests.
I have found no way of performing a new request with the spider_closed
signal.
How can I perform a new request after all other requests are completed?
Upvotes: 2
Views: 1901
Reputation: 18799
you can use the spider_idle
signal, which could send a request when the spider stopped processing everything.
so once you connect a method to the spider_idle
signal with:
self.crawler.signals.connect(self.spider_idle, signal=signals.spider_idle)
you can now use the self.spider_idle
method to call final tasks once the spider stopped processing everything:
class MySpider(Spider):
...
self.logged_out = False
...
def spider_idle(self, spider):
if not self.logged_out:
self.logged_out = True
req = Request('someurl', callback=self.parse_logout)
self.crawler.engine.crawl(req, spider)
Upvotes: 6