Reputation: 95
I am trying to write a Python program to gather data from Google Trends (GT)- specifically, I want to automatically open URLs and access the specific values that are displayed in the line graphs:
I would be happy with downloading the CSV files, or with web-scraping the values (based on my reading of Inspect Element, cleaning the data would only require a simple split or two). I have many searches I want to conduct (many different keywords)
I am creating many URLs to gather data from Google Trends. I used the actual URL from a test search. Example of a URL: https://trends.google.com/trends/explore?q=sports%20cars&geo=US Physically searching this URL on a browser shows the relevant GT page. The problem comes when I try to access it through a program.
Most responses I have seen suggest using public modules from Pip (e.g. PyTrends and the "Unofficial Google Trends API")- my project manager has insisted I do not use modules that are not directly created by the site (i.e.: APIs are acceptable but only an official Google API). Only BeautifulSoup has been sanctioned as a plugin (don't ask why).
Below is an example of the code I have tried. I know it is basic, but on the very first request I got:
HTTPError: HTTP Error 429: unknown": too many requests.
Some responses to other questions mention Google Trends API - is this real? I could not find any documentation on an official API.
Here is another post which outlined a solution that I have tried that did not work for me:
https://codereview.stackexchange.com/questions/208277/web-scraping-google-trends-in-python
url = 'https://trends.google.com/trends/explore?q=sports%20cars&geo=US'
html = urlopen(url).read()
soup = bs(html, 'html.parser')
divs = soup.find_all('div')
return divs
Upvotes: 9
Views: 8271
Reputation: 84465
It's using an API you can find in the network tab
import requests
import json
r = requests.get('https://trends.google.com/trends/api/widgetdata/multiline?hl=en-GB&tz=-60&req=%7B%22time%22:%222018-05-29+2019-05-29%22,%22resolution%22:%22WEEK%22,%22locale%22:%22en-GB%22,%22comparisonItem%22:%5B%7B%22geo%22:%7B%22country%22:%22US%22%7D,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22BROAD%22,%22value%22:%22sports+cars%22%7D%5D%7D%7D%5D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22IZG%22,%22category%22:0%7D%7D&token=APP6_UEAAAAAXO-yaYekqJ7Tf2nuoLBAigMSW7axoLTL&tz=-60')
data = json.loads(r.text.lstrip(")]}\',\n"))
for item in data['default']['timelineData']:
print(item['formattedAxisTime'], item['value'])
We can unquote the url to have a better idea of what is going on:
import urllib.parse
url = 'https://trends.google.com/trends/api/widgetdata/multiline?hl=en-GB&tz=-60&req=%7B%22time%22:%222018-05-29+2019-05-29%22,%22resolution%22:%22WEEK%22,%22locale%22:%22en-GB%22,%22comparisonItem%22:%5B%7B%22geo%22:%7B%22country%22:%22US%22%7D,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22BROAD%22,%22value%22:%22sports+cars%22%7D%5D%7D%7D%5D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22IZG%22,%22category%22:0%7D%7D&token=APP6_UEAAAAAXO-yaYekqJ7Tf2nuoLBAigMSW7axoLTL&tz=-60'
print(urllib.parse.unquote(url))
This yields:
'https://trends.google.com/trends/api/widgetdata/multiline?hl=en-GB&tz=-60&req={"time":"2018-05-29+2019-05-29","resolution":"WEEK","locale":"en-GB","comparisonItem":[{"geo":{"country":"US"},"complexKeywordsRestriction":{"keyword":[{"type":"BROAD","value":"sports+cars"}]}}],"requestOptions":{"property":"","backend":"IZG","category":0}}&token=APP6_UEAAAAAXO-yaYekqJ7Tf2nuoLBAigMSW7axoLTL&tz=-60'
You'll need to explore how transferable elements from this are.
For example, I looked at search term banana and this was the result:
unquoted:
'https://trends.google.com/trends/api/explore?hl=en-GB&tz=-60&req={"comparisonItem":[{"keyword":"banana","geo":"US","time":"today+12-m"}],"category":0,"property":""}&tz=-60'
quoted:
'https://trends.google.com/trends/api/explore?hl=en-GB&tz=-60&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22banana%22,%22geo%22:%22US%22,%22time%22:%22today+12-m%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-60'
Upvotes: 6