harry
harry

Reputation: 95

Access Google Trends Data without a wrapper, or with the API: Python

I am trying to write a Python program to gather data from Google Trends (GT)- specifically, I want to automatically open URLs and access the specific values that are displayed in the line graphs:

enter image description here

I would be happy with downloading the CSV files, or with web-scraping the values (based on my reading of Inspect Element, cleaning the data would only require a simple split or two). I have many searches I want to conduct (many different keywords)

I am creating many URLs to gather data from Google Trends. I used the actual URL from a test search. Example of a URL: https://trends.google.com/trends/explore?q=sports%20cars&geo=US Physically searching this URL on a browser shows the relevant GT page. The problem comes when I try to access it through a program.

Most responses I have seen suggest using public modules from Pip (e.g. PyTrends and the "Unofficial Google Trends API")- my project manager has insisted I do not use modules that are not directly created by the site (i.e.: APIs are acceptable but only an official Google API). Only BeautifulSoup has been sanctioned as a plugin (don't ask why).

Below is an example of the code I have tried. I know it is basic, but on the very first request I got:

HTTPError: HTTP Error 429: unknown": too many requests.

Some responses to other questions mention Google Trends API - is this real? I could not find any documentation on an official API.

Here is another post which outlined a solution that I have tried that did not work for me:

https://codereview.stackexchange.com/questions/208277/web-scraping-google-trends-in-python

url = 'https://trends.google.com/trends/explore?q=sports%20cars&geo=US'

html = urlopen(url).read()

soup = bs(html, 'html.parser')

divs = soup.find_all('div')

return divs

Upvotes: 9

Views: 8271

Answers (1)

QHarr
QHarr

Reputation: 84465

It's using an API you can find in the network tab

import requests
import json

r = requests.get('https://trends.google.com/trends/api/widgetdata/multiline?hl=en-GB&tz=-60&req=%7B%22time%22:%222018-05-29+2019-05-29%22,%22resolution%22:%22WEEK%22,%22locale%22:%22en-GB%22,%22comparisonItem%22:%5B%7B%22geo%22:%7B%22country%22:%22US%22%7D,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22BROAD%22,%22value%22:%22sports+cars%22%7D%5D%7D%7D%5D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22IZG%22,%22category%22:0%7D%7D&token=APP6_UEAAAAAXO-yaYekqJ7Tf2nuoLBAigMSW7axoLTL&tz=-60')
data = json.loads(r.text.lstrip(")]}\',\n"))

for item in data['default']['timelineData']:
    print(item['formattedAxisTime'], item['value'])

We can unquote the url to have a better idea of what is going on:

import urllib.parse

url = 'https://trends.google.com/trends/api/widgetdata/multiline?hl=en-GB&tz=-60&req=%7B%22time%22:%222018-05-29+2019-05-29%22,%22resolution%22:%22WEEK%22,%22locale%22:%22en-GB%22,%22comparisonItem%22:%5B%7B%22geo%22:%7B%22country%22:%22US%22%7D,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22BROAD%22,%22value%22:%22sports+cars%22%7D%5D%7D%7D%5D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22IZG%22,%22category%22:0%7D%7D&token=APP6_UEAAAAAXO-yaYekqJ7Tf2nuoLBAigMSW7axoLTL&tz=-60'
print(urllib.parse.unquote(url))

This yields:

'https://trends.google.com/trends/api/widgetdata/multiline?hl=en-GB&tz=-60&req={"time":"2018-05-29+2019-05-29","resolution":"WEEK","locale":"en-GB","comparisonItem":[{"geo":{"country":"US"},"complexKeywordsRestriction":{"keyword":[{"type":"BROAD","value":"sports+cars"}]}}],"requestOptions":{"property":"","backend":"IZG","category":0}}&token=APP6_UEAAAAAXO-yaYekqJ7Tf2nuoLBAigMSW7axoLTL&tz=-60'

You'll need to explore how transferable elements from this are.

For example, I looked at search term banana and this was the result:

unquoted:

'https://trends.google.com/trends/api/explore?hl=en-GB&tz=-60&req={"comparisonItem":[{"keyword":"banana","geo":"US","time":"today+12-m"}],"category":0,"property":""}&tz=-60'

quoted:

'https://trends.google.com/trends/api/explore?hl=en-GB&tz=-60&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22banana%22,%22geo%22:%22US%22,%22time%22:%22today+12-m%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-60'

Upvotes: 6

Related Questions