Bociek
Bociek

Reputation: 1265

Stop Scrapy request pipeline for a few minutes and retry

I am scraping a single domain using Scrapy and Crawlera proxy and sometimes due to Crawlera issues (technical break) and I am getting 407 status code and can't scrape any site. Is it possible to stop request pipeline for 10 minutes and then restart the spider? To be clear I do not wan't to defer the request, but stop everything (maybe except Item processing) for 10 minutes until they resolve the problem. I am running 10 concurrent threads.

Upvotes: 0

Views: 294

Answers (1)

Granitosaurus
Granitosaurus

Reputation: 21436

Yes you can, there are few ways of doing this but the most obvious would be is simply insert some blocking code:

# middlewares.py
class BlockMiddleware:

    def process_response(self, response, request):
        if response.status == 407:
            print('beep boop, taking a nap')
            time.sleep(60)

and activate it:

# settings.py
DOWNLOADER_MIDDLEWARES = {
    'myproject.middlewares.BlockMiddleware': 100,
{

Upvotes: 1

Related Questions