Reputation: 11

How to avoid 'HTTP error code:429' while web scraping?

I'm trying to web scrape a information from Google and they aren't liking it. The vector contains 2487 Google sites and from which one of them I want to get the text of the first result.

I tried to create a loop to slow down the process but I'm very bad at it.

b is the value that contain all the web sites. First, I tried:

ContentScraper(b, CssPatterns = ".st") -> b

But then, I tried to loop and slow it down, but I have no idea how to.

b[i] <- ContentScraper(i, CssPatterns = ".st")}

From the 55th and on all that I get is the error. Any thoughts on how to avoid it? Thanks.

Upvotes: 0

Answers (2)

Yusuf Ganiyu

Reputation: 1087

One way is to use

Sys.sleep(...)

Another way if you're using puppeteer or playwright you can adjust the interval of the scrapes with celery beat.

Is that what you're looking for?

Upvotes: 0

Mariano

Reputation: 401

Insert Sys.sleep(...) inside the loop at the beginning of it

Upvotes: 0

How to avoid &#39;HTTP error code:429&#39; while web scraping?

Answers (2)

Related Questions

How to avoid 'HTTP error code:429' while web scraping?