pythonpython-3.xselenium-webdriverpython-requests

Reputation: 115

Error 429 with simple query on google with requests python

I am trying to get the first non-ad result on a simple query on Google.

res = requests.get('https://www.google.com?q=' + query)

Assign any value to query and you will get an error. I have tried to add some headers, but nothing changes.

I have tried to add all other parameters that google typically associates to a query and again nothing changes.

No problems if you do the search with selenium.

The error code is 429, but this seems to be just a standard response for this query. It has nothing to do with my IP and I am not spamming Google, and this does not disappear after a while.

Do you know why this happens, and is there some header I can add, or any other solution to just see the results, as if you were searching that keyword on google?

Upvotes: 9

Answers (4)

Dmitriy Zub

Reputation: 1724

It's the most common question on the StackOverFlow that is being asked 200+ times in [requests] and [bs4] tags, and pretty much every solution lies down to simply adding user-agent.

User-agent is needed to act as a "real" user visit when the bot or browser sends a fake user-agent string to announce themselves as a different client.

When no user-agent is being passed to request headers while using requests library, it defaults to python-requests and Google understands that it's a bot/script, then it blocks a request (or whatever it does) and you receive a different HTML (with some sort of an error) with different CSS selectors. Check what's your user-agent. List of user-agents.

Note: Adding user-agent doesn't mean that it will fix the problem and you still can get a 429 (or different) error, even when rotating user-agents.

I wrote a dedicated blog about how to reduce the chance of being blocked while web scraping search engines. In short, you need:

rotate user-agent.
add proxies (and rotate them)
captcha solver to solve Google (or another website captcha)
browserless (browser automation, optional)

Pass user-agent:

headers = {
    'User-agent':
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
}

requests.get('URL', headers=headers)

Alternatively, you can achieve the same thing by using Google Organic Results API from SerpApi. It's a paid API with a free plan.

The difference is that you don't have to spend time trying to bypass blocks from Google and figuring out why certain things don't work.

Disclaimer, I work for SerpApi.

Upvotes: 1

undetected Selenium

Reputation: 193218

429 Too Many Requests

The HTTP 429 Too Many Requests response status code indicates that the user has sent too many requests in a given amount of time ("rate limiting"). The response representations SHOULD include details explaining the condition, and MAY include a Retry-After header indicating how long to wait before making a new request.

When a server is under attack or just receiving a very large number of requests from a single party, responding to each with a 429 status code will consume resources. Therefore, servers are not required to use the 429 status code; when limiting resource usage, it may be more appropriate to just drop connections, or take other steps.

However, when I took you code and executed the same test, I got the perfect result as follows:

Code Block:

  import requests

  query = "selenium"
  headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
  url = 'https://www.google.com/search?q=' + query
  res = requests.get(url, headers=headers)
  print(res)

Console Output:
```
  <Response [200]>
```

You can find a relevant discussion in Failed to load resource: the server responded with a status of 429 (Too Many Requests) and 404 (Not Found) with ChromeDriver Chrome through Selenium

Upvotes: 5

MadFish.DT

Reputation: 41

I found reason why google simple query, rest-api request make 429 error.

user-agent header is one of reason, but I tried to insert user-agent header in request. but 429 error was made in response.

finally I found why, reason is cookies.

if you want access google page apis, first of all you have to get cookies from basic google urls like google.com, trend.google.com, YouTube.com. this basic site can be accessed by using any request method.

 googleTrendsUrl = 'https://google.com'
 response = requests.get(googleTrendsUrl)
 if response.status_code == 200:
    g_cookies = response.cookies.get_dict()

and this cookies insert into api request with user-agent

  headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)\
            AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'}
  url = 'https://www.google.com?q=' + query
  res = requests.get(url, headers=headers, cookies=g_cookies)

Upvotes: 4

ParthS007

Reputation: 2689

Since you are getting status code 429 which means you have sent too many requests in a given amount of time ("rate limiting"). Read in more detail here.

Add Headers in your request just like this:

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5)\
            AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'}

So the final request will be:

url = 'https://www.google.com?q=' + query
res = requests.get(url, headers=headers)

Upvotes: 5

Error 429 with simple query on google with requests python

Answers (4)

429 Too Many Requests

Related Questions