Reputation: 143
I'm using Pytrends to extract Google trends data, like:
from pytrends.request import TrendReq
pytrend = TrendReq()
pytrend.build_payload(kw_list=['bitcoin'], cat=0, timeframe=from_date+' '+today_date)
And it returns an error:
ResponseError: The request failed: Google returned a response with code 429.
I made it yesterday and for some reason it doesn't work now! The source code from github failed too:
pytrends = TrendReq(hl='en-US', tz=360, proxies = {'https': 'https://34.203.233.13:80'})
How can I fix this? Thanks a lot!
Upvotes: 9
Views: 45948
Reputation: 547
TLDR; I solved the problem with a custom patch
The problem comes from the Google bot recognition system. As other similar systems do, it stops serving too frequent requests coming from suspicious clients. Some of the features used to recognize trustworthy clients are the presence of specific headers generated by the javascript code present on the web pages. Unfortunately, the python requests library does not provide such a level of camouflage against those bot recognition systems since javascript code is not even executed. So the idea behind my patch is to leverage the headers generated by my browser interacting with google trends. Those headers are generated by the browser meanwhile I am logged in using my Google account, in other words, those headers are linked with my google account, so for them, I am trustworthy.
I solved in the following way:
from pytrends.request import TrendReq as UTrendReq
GET_METHOD='get'
import requests
headers = {
...
}
class TrendReq(UTrendReq):
def _get_data(self, url, method=GET_METHOD, trim_chars=0, **kwargs):
return super()._get_data(url, method=GET_METHOD, trim_chars=trim_chars, headers=headers, **kwargs)
Upvotes: 15
Reputation: 13
Now we are facing same issue again following code will help to resolve 429 issue
For Interest below code will work with multiple retries by changing browsers
import json
import urllib.parse
from datetime import datetime, timedelta
from curl_cffi import requests
import time
def build_payload(keywords, timeframe='now 7-d', geo='US'):
token_payload = {
'hl': 'en-US',
'tz': '0',
'req': {
'comparisonItem': [{'keyword': keyword, 'time': timeframe, 'geo': geo} for keyword in keywords],
'category': 0,
'property': ''
}
}
token_payload['req'] = json.dumps(token_payload['req'])
return token_payload
def convert_to_desired_format(raw_data):
trend_data = {}
for entry in raw_data['default']['timelineData']:
timestamp = int(entry['time'])
date_time_str = datetime.utcfromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
value = entry['value'][0]
trend_data[date_time_str] = value
return trend_data
# Cookies
def get_google_cookies(impersonate_version='chrome110'):
with requests.Session() as session:
session.get("https://www.google.com", impersonate=impersonate_version)
return session.cookies
def fetch_trends_data(keywords, days_ago=7, geo='US', hl='en-US', max_retries=5, browser_version='chrome110', browser_switch_retries=2):
browser_versions = ['chrome110', 'edge101', 'chrome107', 'chrome104', 'chrome100', 'chrome101', 'chrome99']
current_browser_version_index = browser_versions.index(browser_version)
cookies = get_google_cookies(impersonate_version=browser_versions[current_browser_version_index])
for browser_retry in range(browser_switch_retries + 1):
data_fetched = False # Reset data_fetched to False at the beginning of each browser_retry
with requests.Session() as s:
# phase 1: token
for retry in range(max_retries):
time.sleep(2)
token_payload = build_payload(keywords)
url = 'https://trends.google.com/trends/api/explore'
params = urllib.parse.urlencode(token_payload)
full_url = f"{url}?{params}"
response = s.get(full_url, impersonate=browser_versions[current_browser_version_index], cookies=cookies)
if response.status_code == 200:
content = response.text[4:]
try:
data = json.loads(content)
widgets = data['widgets']
tokens = {}
request = {}
for widget in widgets:
if widget['id'] == 'TIMESERIES':
tokens['timeseries'] = widget['token']
request['timeseries'] = widget['request']
break # Break out of the retry loop as we got the token
except json.JSONDecodeError:
print(f"Failed to decode JSON while fetching token, retrying {retry + 1}/{max_retries}")
else:
print(f"Error {response.status_code} while fetching token, retrying {retry + 1}/{max_retries}")
else:
print(f"Exceeded maximum retry attempts ({max_retries}) while fetching token. Exiting...")
return None
# phase 2: trends data
for retry in range(max_retries):
time.sleep(5)
req_string = json.dumps(request['timeseries'], separators=(',', ':'))
encoded_req = urllib.parse.quote(req_string, safe=':,+')
url = f"https://trends.google.com/trends/api/widgetdata/multiline?hl={hl}&tz=0&req={encoded_req}&token={tokens['timeseries']}&tz=0"
response = s.get(url, impersonate=browser_versions[current_browser_version_index], cookies=cookies)
if response.status_code == 200:
content = response.text[5:]
try:
raw_data = json.loads(content)
# Convert raw data
trend_data = convert_to_desired_format(raw_data)
data_fetched = True # Set data_fetched to True as we have successfully fetched the trend data
return trend_data
except json.JSONDecodeError:
print(f"Failed to decode JSON while fetching trends data, retrying {retry + 1}/{max_retries}")
else:
print(f"Error {response.status_code} while fetching trends data, retrying {retry + 1}/{max_retries}")
else:
print(f"Exceeded maximum retry attempts ({max_retries}) while fetching trends data.")
# change browser
if not data_fetched and browser_retry < browser_switch_retries:
time.sleep(5)
current_browser_version_index = (current_browser_version_index + 1) % len(browser_versions)
print(f"Switching browser version to {browser_versions[current_browser_version_index]} and retrying...")
print(f"Exceeded maximum browser switch attempts ({browser_switch_retries}). Exiting...")
return None
# Example
keywords = ["test"]
trends_data = fetch_trends_data(keywords)
print(trends_data)
Following code for Top and Rising Queries
import json
import urllib.parse
from datetime import datetime, timedelta
from curl_cffi import requests
import time
import os
def build_payload(keywords, timeframe='now 1-H', geo=''):
token_payload = {
'hl': 'en-US',
'tz': '0',
'req': {
'comparisonItem': [{'keyword': keyword, 'time': timeframe, 'geo': geo} for keyword in keywords],
'category': 0,
'property': ''
}
}
token_payload['req'] = json.dumps(token_payload['req'])
return token_payload
def convert_to_desired_format(raw_data):
trend_data = {'TOP': {}, 'RISING': {}}
if 'rankedList' in raw_data.get('default', {}):
for item in raw_data['default']['rankedList']:
for entry in item.get('rankedKeyword', []):
query = entry.get('query')
value = entry.get('value')
if query and value:
link = entry.get('link', '')
trend_type = link.split('=')[-1].split('&')[0].upper() if link else None
if trend_type in ['TOP', 'RISING']:
trend_data[trend_type][query] = value
return trend_data
def get_google_cookies(impersonate_version='chrome110'):
with requests.Session() as session:
session.get("https://www.google.com", impersonate=impersonate_version)
return session.cookies
def fetch_trends_data(keywords, days_ago=7, geo='US', hl='en-US', max_retries=5, browser_version='chrome110', browser_switch_retries=2):
browser_versions = ['chrome110', 'edge101', 'chrome107', 'chrome104', 'chrome100', 'chrome101', 'chrome99']
current_browser_version_index = browser_versions.index(browser_version)
cookies = get_google_cookies(impersonate_version=browser_versions[current_browser_version_index])
for browser_retry in range(browser_switch_retries + 1):
data_fetched = False
with requests.Session() as s:
# phase 1: token
for retry in range(max_retries):
time.sleep(2)
token_payload = build_payload(keywords)
url = 'https://trends.google.com/trends/api/explore'
params = urllib.parse.urlencode(token_payload)
full_url = f"{url}?{params}"
response = s.get(full_url, impersonate=browser_versions[current_browser_version_index], cookies=cookies)
if response.status_code == 200:
content = response.text[4:]
try:
data = json.loads(content)
widgets = data['widgets']
tokens = {}
request = {}
for widget in widgets:
if widget['id'] == 'RELATED_QUERIES':
tokens['related_queries'] = widget['token']
request['related_queries'] = widget['request']
break
except json.JSONDecodeError:
print(f"Failed to decode JSON while fetching token, retrying {retry + 1}/{max_retries}")
else:
print(f"Error {response.status_code} while fetching token, retrying {retry + 1}/{max_retries}")
else:
print(f"Exceeded maximum retry attempts ({max_retries}) while fetching token. Exiting...")
return None
# phase 2: trends data
for retry in range(max_retries):
time.sleep(5)
req_string = json.dumps(request['related_queries'], separators=(',', ':'))
encoded_req = urllib.parse.quote(req_string, safe=':,+')
url = f"https://trends.google.com/trends/api/widgetdata/relatedsearches?hl={hl}&tz=0&req={encoded_req}&token={tokens['related_queries']}&tz=0"
response = s.get(url, impersonate=browser_versions[current_browser_version_index], cookies=cookies)
print(f"URL: {url}")
if response.status_code == 200:
content = response.text[5:]
try:
file_name = f"trends_data_{os.getpid()}.json"
with open(file_name, 'w') as json_file:
json_file.write(content)
# Remove first line from the file
with open(file_name, 'r') as f:
lines = f.readlines()[1:]
with open(file_name, 'w') as f:
f.writelines(lines)
# Load JSON content from the file
with open(file_name, 'r') as json_file:
data = json.load(json_file)
# Extract and print queries and values from both rankedLists separately
for item in data['default']['rankedList'][0]['rankedKeyword']:
print(f"Top: {item['query']}, Value: {item['value']}")
for item in data['default']['rankedList'][1]['rankedKeyword']:
print(f"Rising: {item['query']}, Value: {item['value']}")
return content
except json.JSONDecodeError:
print(f"Failed to decode JSON while fetching trends data, retrying {retry + 1}/{max_retries}")
else:
print(f"Error {response.status_code} while fetching trends data, retrying {retry + 1}/{max_retries}")
else:
print(f"Exceeded maximum retry attempts ({max_retries}) while fetching trends data.")
if not data_fetched and browser_retry < browser_switch_retries:
time.sleep(5)
current_browser_version_index = (current_browser_version_index + 1) % len(browser_versions)
print(f"Switching browser version to {browser_versions[current_browser_version_index]} and retrying...")
print(f"Exceeded maximum browser switch attempts ({browser_switch_retries}). Exiting...")
return None
# Example
keywords = ["test"]
trends_data = fetch_trends_data(keywords)
print(trends_data)
Upvotes: 1
Reputation: 11
I was having the same issue and did something really similar to Antonio Ercole De Luca. For me, however, the issue was with the cookies and not the headers.
I created a subclass like Antonio did, but this time modifying the cookie method:
cookies = {
"SEARCH_SAMESITE": "####",
"SID": "####",
.
.
.
}
class CookieTrendReq(TrendReq):
def GetGoogleCookie(self):
return dict(filter(lambda i: i[0] == 'NID', cookies.items()))
And I used the same method to get the cookies as he did to get the headers:
Upvotes: 1
Reputation: 39
I had the same problem even after updating the module with pip install --upgrade --user git+https://github.com/GeneralMills/pytrends
and restart python.
But, the issue was solved via the below method:
Instead of
pytrends = TrendReq(hl='en-US', tz=360, timeout=(10,25), proxies=['https://34.203.233.13:80',], retries=2, backoff_factor=0.1, requests_args={'verify':False})
Just ran:
pytrend = TrendReq()
Hope this can be helpful!
Upvotes: 3
Reputation: 1587
This one took a while but it turned out the library just needed an update. You can check out a few of the approaches I posted here, both of which resulted in Status 429 Responses:
https://github.com/GeneralMills/pytrends/issues/243
Ultimately, I was able to get it working again by running the following command from my bash prompt:
Run:
pip install --upgrade --user git+https://github.com/GeneralMills/pytrends
For the latest version.
Hope that works for you too.
EDIT:
If you can't upgrade from source you may have some luck with:
pip install pytrends --upgrade
Also, make sure you're running git as an administrator if on Windows.
Upvotes: 11
Reputation: 23
After running the upgrade command via pip install, you should restart the python kernel and reload the pytrend library.
Upvotes: 1