Reputation: 1265
I am scraping a single domain using Scrapy and Crawlera proxy and sometimes due to Crawlera issues (technical break) and I am getting 407 status code and can't scrape any site. Is it possible to stop request pipeline for 10 minutes and then restart the spider? To be clear I do not wan't to defer the request, but stop everything (maybe except Item processing) for 10 minutes until they resolve the problem. I am running 10 concurrent threads.
Upvotes: 0
Views: 294
Reputation: 21436
Yes you can, there are few ways of doing this but the most obvious would be is simply insert some blocking code:
# middlewares.py
class BlockMiddleware:
def process_response(self, response, request):
if response.status == 407:
print('beep boop, taking a nap')
time.sleep(60)
and activate it:
# settings.py
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.BlockMiddleware': 100,
{
Upvotes: 1