Reputation: 1280
I'm new to Scrapy, and not too impressive with Python. I've got a scraper set up to scrape data from a website, but although I'm using proxies, if the same proxy is used too many times then my request is shown a page telling me I'm visiting too many pages too quickly (HTTP status code 200).
As my scraper see's the page's status code as okay, it doesn't find the needed data and moves on to the next page.
I can determine when these pages are show via HtmlXPathSelector, but how do i signal Scrapy to retry that page?
Upvotes: 3
Views: 2253
Reputation: 55972
Scrapy comes with a built-in retry
middleware. You could subclass it and override the process_response
method to include a check to see if the page that is telling you that you're visiting too many pages too quickly is showing up
Upvotes: 3