Stan van Rooy
Stan van Rooy

Reputation: 346

Make a final Request before closing Scrapy spider

The problem is quite simple, there is a spider which logs in to a website, crawls some data and then quits. The required behavior is logging in, crawling the data and then logging out.

Hard coding this in is not possible, since there are about 60 spiders, they are all inheriting from a BaseSpider.

I've tried using signals and add a logout function to the spider_idle signal which will simply send a request to a logout URL each spider will need to provide, I couldn't get it to work though, the logout function was never called and I haven't been able to figure out why not?

Here is the code:

    @classmethod
    def from_crawler(cls, crawler, *args, **kwargs):
        spider = super(BaseSpider, cls).from_crawler(crawler, *args, **kwargs)
        crawler.signals.connect(spider.spider_idle, signal=signals.spider_idle)

    def spider_idle(self, spider):
        if not self.logged_out:
            self.crawler.engine.crawl(Request(self.logout_url, callback=self.logout), spider)

    def logout(self, response):
        self.logged_out = True

I don't see why this wouldn't work. As I understand, the spider_idle signal gets called when there are no more requests in the queue / the spider is done.

Upvotes: 1

Views: 839

Answers (1)

Umair Ayub
Umair Ayub

Reputation: 21201

I have been using Scrapy for many years and ended up in a scenario like yours

Only solution to achieve your goal is to use the Python's requests library inside the spider_closed method

spider_idle etc don't help

Upvotes: 2

Related Questions