Seminko
Seminko

Reputation: 163

Using proxies for scraping - how to tell a proxy is dead vs web blocking you?

I scrape a lot but so far I'm using a VPN for my scrapes. I would like to start using proxies but the problem I'm running into, especially with free proxies, is that free proxies are highly unreliable.

How do I tell whether there is an issue with the webpage compared to an issue with the proxy? There are timeouts, connectionerrors, etc exceptions but those happen both when a proxy is bad as well as when the webpage has a problem.

So in other words, how do I know whether I need to rotate a dead proxy compared to when there is a problem with the URL I want to scrape and I should stop trying and skip it?

Upvotes: 0

Views: 484

Answers (1)

Dan Suciu
Dan Suciu

Reputation: 261

It's hard to make a difference between a website that's down and a proxy that's not functional because you might get the same HTTP error.

My recommendation is to create a proxy checker: a simple tool that will iterate over your proxies list, connect to one and access a website that you control (think of a simple Express web server with a single endpoint). The proxy checker will run every 30 seconds.

By doing it this way, you will have the guarantee the website is never down (you will not block yourself) and if you're getting an error, it's definitely a proxy error.

Once you get an error, you remove the proxy from the list (and add it later when it will come back online).

Upvotes: 1

Related Questions