Reputation: 3249
I have been trying pytrends and I discovered that interest_by_region=city
is only implemented for USA:
if self.geo == '':
self.interest_by_region_widget['request'][
'resolution'] = resolution
elif self.geo == 'US' and resolution in ['DMA', 'CITY', 'REGION']:
self.interest_by_region_widget['request'][
'resolution'] = resolution
I tried to discover what is missing in the code for other countries, but I am not able to find. I only know based on this piece of code above, that it only works for USA. Furthermore, I know that I can specify the city level in google trends. Can one help me find what is the part of pytrends that I have to implement?
EDIT:
I implemented the suggestion of @mcskinner (+1) that really makes the things simpler (but I got the same problem of my hack). Now, my code is:
import json
import pandas as pd
from pytrends.request import TrendReq
#from request import TrendReq
class MyTrendReq(TrendReq):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def interest_by_region(self, resolution='COUNTRY', inc_low_vol=False,
inc_geo_code=False):
"""Request data from Google's Interest by Region section and return a dataframe"""
# make the request
region_payload = dict()
if self.geo == '':
self.interest_by_region_widget['request']['resolution'] = resolution
elif self.geo == 'US' and resolution in ['DMA', 'CITY', 'REGION']:
self.interest_by_region_widget['request']['resolution'] = resolution
elif len(self.geo) == 2 and resolution in ['CITY', 'REGION']:
self.interest_by_region_widget['request']['resolution'] = resolution
self.interest_by_region_widget['request'][
'includeLowSearchVolumeGeos'] = inc_low_vol
# convert to string as requests will mangle
region_payload['req'] = json.dumps(
self.interest_by_region_widget['request'])
region_payload['token'] = self.interest_by_region_widget['token']
region_payload['tz'] = self.tz
# parse returned json
req_json = self._get_data(
url=TrendReq.INTEREST_BY_REGION_URL,
method=TrendReq.GET_METHOD,
trim_chars=5,
params=region_payload,
)
df = pd.DataFrame(req_json['default']['geoMapData'])
if (df.empty):
return df
# rename the column with the search keyword
df = df[['geoName', 'geoCode', 'value']].set_index(
['geoName']).sort_index()
# split list columns into seperate ones, remove brackets and split on comma
result_df = df['value'].apply(lambda x: pd.Series(
str(x).replace('[', '').replace(']', '').split(',')))
if inc_geo_code:
result_df['geoCode'] = df['geoCode']
# rename each column with its search term
for idx, kw in enumerate(self.kw_list):
result_df[kw] = result_df[idx].astype('int')
del result_df[idx]
return result_df
#import pytrends
if __name__=="__main__":
pytrend = MyTrendReq()
pytrend.build_payload(kw_list=['BMW'],geo='BR',timeframe='2019-03-01 2020-03-02')
# df = pytrend.interest_by_region(resolution='REGION', inc_low_vol=True, inc_geo_code=True)
df = pytrend.interest_by_region(resolution='CITY', inc_low_vol=True, inc_geo_code=True)
I got the following error (it seems that something is missing, but I am able to manually do this kind of search in google trends):
runfile('/home/daniel/Documents/caju/testingPytrendsStackoverflow.py', wdir='/home/daniel/Documents/caju')
Traceback (most recent call last):
File "<ipython-input-8-3a8c4f9b3a66>", line 1, in <module>
runfile('/home/daniel/Documents/caju/testingPytrendsStackoverflow.py', wdir='/home/daniel/Documents/caju')
File "/usr/lib/python3/dist-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "/usr/lib/python3/dist-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/home/daniel/Documents/caju/testingPytrendsStackoverflow.py", line 72, in <module>
df = pytrend.interest_by_region(resolution='CITY', inc_low_vol=True, inc_geo_code=True)
File "/home/daniel/Documents/caju/testingPytrendsStackoverflow.py", line 53, in interest_by_region
df = df[['geoName', 'geoCode', 'value']].set_index(
File "/home/daniel/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2986, in __getitem__
indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)
File "/home/daniel/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1285, in _convert_to_indexer
return self._get_listlike_indexer(obj, axis, **kwargs)[1]
File "/home/daniel/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1092, in _get_listlike_indexer
keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
File "/home/daniel/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1185, in _validate_read_indexer
raise KeyError("{} not in index".format(not_found))
KeyError: "['geoCode'] not in index"
If I replace in my code
df = pytrend.interest_by_region(resolution='CITY', inc_low_vol=True, inc_geo_code=True)
by
# df = pytrend.interest_by_region(resolution='REGION', inc_low_vol=True, inc_geo_code=True)
It works.
EDIT 2:
@mcskinner is right.
If I make inc_geo_code=False
and I comment
# df = df[['geoName', 'geoCode', 'value']].set_index( # ['geoName']).sort_index()
It works, but I loose the information of the city:
BMW
0 100
1 90
2 88
3 88
4 84
.. ...
105 43
106 43
107 42
108 42
109 38
The point is where should I include the missing geocode information for Brazil?
Upvotes: 1
Views: 1986
Reputation: 11
There is a small bug in pytrends' source code. There are no geocodes associated with cities.
To fix the problem, change the line
df = df[['geoName', 'geoCode', 'value']].set_index(['geoName']).sort_index()
to
df = df[['geoName', 'coordinates', 'value']].set_index(['geoName']).sort_index()
Upvotes: 1
Reputation: 2748
Right after the code you identified, as part of the same if
/elif
branching, you could add an additional branch for all non-global and non-US regions.
if self.geo == '':
self.interest_by_region_widget['request']['resolution'] = resolution
elif self.geo == 'US' and resolution in ['DMA', 'CITY', 'REGION']:
self.interest_by_region_widget['request']['resolution'] = resolution
elif len(self.geo) == 2 and resolution in ['CITY', 'REGION']:
self.interest_by_region_widget['request']['resolution'] = resolution
The condition on length 2 is a bit of a hack to identify countries. You could also get rid of the if
condition and just always try to use the resolution
.
self.interest_by_region_widget['request']['resolution'] = resolution
Some combinations are now invalid (REGION breakdown of a METRO), and Google Trends will fail for those. You would still need to be careful to handle those or only send valid combinations, but this would give you the freedom to do that.
Note that all of these require modifying the library code. To do it yourself, you would want to create your own subclass of TrendReq
and override the interest_by_region method with your own modified copy.
class MyTrendReq(TrendReq):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def interest_by_region(self, resolution='COUNTRY', inc_low_vol=False,
inc_geo_code=False):
# Your modified copy goes here.
Upvotes: 2