DanielTheRocketMan
DanielTheRocketMan

Reputation: 3249

Pytrends is only implemented in city level for USA (and without geocode)

I have been trying pytrends and I discovered that interest_by_region=city is only implemented for USA:

 if self.geo == '': 
     self.interest_by_region_widget['request'][ 
         'resolution'] = resolution 
 elif self.geo == 'US' and resolution in ['DMA', 'CITY', 'REGION']: 
     self.interest_by_region_widget['request'][ 
         'resolution'] = resolution 

I tried to discover what is missing in the code for other countries, but I am not able to find. I only know based on this piece of code above, that it only works for USA. Furthermore, I know that I can specify the city level in google trends. Can one help me find what is the part of pytrends that I have to implement?

EDIT:

I implemented the suggestion of @mcskinner (+1) that really makes the things simpler (but I got the same problem of my hack). Now, my code is:

import json

import pandas as pd                        
from pytrends.request import TrendReq

#from request import TrendReq

class MyTrendReq(TrendReq):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def interest_by_region(self, resolution='COUNTRY', inc_low_vol=False,
                           inc_geo_code=False):
        """Request data from Google's Interest by Region section and return a dataframe"""

        # make the request
        region_payload = dict()

        if self.geo == '': 
            self.interest_by_region_widget['request']['resolution'] = resolution 
        elif self.geo == 'US' and resolution in ['DMA', 'CITY', 'REGION']: 
            self.interest_by_region_widget['request']['resolution'] = resolution 
        elif len(self.geo) == 2 and resolution in ['CITY', 'REGION']:
            self.interest_by_region_widget['request']['resolution'] = resolution        

        self.interest_by_region_widget['request'][
            'includeLowSearchVolumeGeos'] = inc_low_vol

        # convert to string as requests will mangle
        region_payload['req'] = json.dumps(
            self.interest_by_region_widget['request'])
        region_payload['token'] = self.interest_by_region_widget['token']
        region_payload['tz'] = self.tz

        # parse returned json
        req_json = self._get_data(
            url=TrendReq.INTEREST_BY_REGION_URL,
            method=TrendReq.GET_METHOD,
            trim_chars=5,
            params=region_payload,
        )
        df = pd.DataFrame(req_json['default']['geoMapData'])
        if (df.empty):
            return df

        # rename the column with the search keyword
        df = df[['geoName', 'geoCode', 'value']].set_index(
            ['geoName']).sort_index()
        # split list columns into seperate ones, remove brackets and split on comma
        result_df = df['value'].apply(lambda x: pd.Series(
            str(x).replace('[', '').replace(']', '').split(',')))
        if inc_geo_code:
            result_df['geoCode'] = df['geoCode']

        # rename each column with its search term
        for idx, kw in enumerate(self.kw_list):
            result_df[kw] = result_df[idx].astype('int')
            del result_df[idx]

        return result_df
#import pytrends
if __name__=="__main__":
    pytrend = MyTrendReq()
    pytrend.build_payload(kw_list=['BMW'],geo='BR',timeframe='2019-03-01 2020-03-02')
   # df = pytrend.interest_by_region(resolution='REGION', inc_low_vol=True, inc_geo_code=True)    
    df = pytrend.interest_by_region(resolution='CITY', inc_low_vol=True, inc_geo_code=True)

I got the following error (it seems that something is missing, but I am able to manually do this kind of search in google trends):

runfile('/home/daniel/Documents/caju/testingPytrendsStackoverflow.py', wdir='/home/daniel/Documents/caju')
Traceback (most recent call last):

  File "<ipython-input-8-3a8c4f9b3a66>", line 1, in <module>
    runfile('/home/daniel/Documents/caju/testingPytrendsStackoverflow.py', wdir='/home/daniel/Documents/caju')

  File "/usr/lib/python3/dist-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "/usr/lib/python3/dist-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/home/daniel/Documents/caju/testingPytrendsStackoverflow.py", line 72, in <module>
    df = pytrend.interest_by_region(resolution='CITY', inc_low_vol=True, inc_geo_code=True)

  File "/home/daniel/Documents/caju/testingPytrendsStackoverflow.py", line 53, in interest_by_region
    df = df[['geoName', 'geoCode', 'value']].set_index(

  File "/home/daniel/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2986, in __getitem__
    indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)

  File "/home/daniel/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1285, in _convert_to_indexer
    return self._get_listlike_indexer(obj, axis, **kwargs)[1]

  File "/home/daniel/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1092, in _get_listlike_indexer
    keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing

  File "/home/daniel/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1185, in _validate_read_indexer
    raise KeyError("{} not in index".format(not_found))

KeyError: "['geoCode'] not in index"

If I replace in my code

df = pytrend.interest_by_region(resolution='CITY', inc_low_vol=True, inc_geo_code=True)

by

   # df = pytrend.interest_by_region(resolution='REGION', inc_low_vol=True, inc_geo_code=True)    

It works.

EDIT 2: @mcskinner is right. If I make inc_geo_code=False and I comment

# df = df[['geoName', 'geoCode', 'value']].set_index( # ['geoName']).sort_index()

It works, but I loose the information of the city:

     BMW
0    100
1     90
2     88
3     88
4     84
..   ...
105   43
106   43
107   42
108   42
109   38

The point is where should I include the missing geocode information for Brazil?

Upvotes: 1

Views: 1986

Answers (2)

Shilin Jia
Shilin Jia

Reputation: 11

There is a small bug in pytrends' source code. There are no geocodes associated with cities.

To fix the problem, change the line

df = df[['geoName', 'geoCode', 'value']].set_index(['geoName']).sort_index()

to

df = df[['geoName', 'coordinates', 'value']].set_index(['geoName']).sort_index()

Upvotes: 1

mcskinner
mcskinner

Reputation: 2748

Right after the code you identified, as part of the same if/elif branching, you could add an additional branch for all non-global and non-US regions.

if self.geo == '': 
    self.interest_by_region_widget['request']['resolution'] = resolution 
elif self.geo == 'US' and resolution in ['DMA', 'CITY', 'REGION']: 
    self.interest_by_region_widget['request']['resolution'] = resolution 
elif len(self.geo) == 2 and resolution in ['CITY', 'REGION']:
    self.interest_by_region_widget['request']['resolution'] = resolution

The condition on length 2 is a bit of a hack to identify countries. You could also get rid of the if condition and just always try to use the resolution.

self.interest_by_region_widget['request']['resolution'] = resolution

Some combinations are now invalid (REGION breakdown of a METRO), and Google Trends will fail for those. You would still need to be careful to handle those or only send valid combinations, but this would give you the freedom to do that.

Note that all of these require modifying the library code. To do it yourself, you would want to create your own subclass of TrendReq and override the interest_by_region method with your own modified copy.

class MyTrendReq(TrendReq):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def interest_by_region(self, resolution='COUNTRY', inc_low_vol=False,
                           inc_geo_code=False):
       # Your modified copy goes here.

Upvotes: 2

Related Questions