john
john

Reputation: 1280

How do i conditionally retry and rescrape the current page in Scrapy?

I'm new to Scrapy, and not too impressive with Python. I've got a scraper set up to scrape data from a website, but although I'm using proxies, if the same proxy is used too many times then my request is shown a page telling me I'm visiting too many pages too quickly (HTTP status code 200).

As my scraper see's the page's status code as okay, it doesn't find the needed data and moves on to the next page.

I can determine when these pages are show via HtmlXPathSelector, but how do i signal Scrapy to retry that page?

Upvotes: 3

Views: 2253

Answers (1)

dm03514
dm03514

Reputation: 55972

Scrapy comes with a built-in retry middleware. You could subclass it and override the process_response method to include a check to see if the page that is telling you that you're visiting too many pages too quickly is showing up

Upvotes: 3

Related Questions