Brian
Brian

Reputation: 251

google daily search trend scraping

I've taught myself web scraping and would like to scrape data from Google daily search trend here: https://trends.google.com/trends/trendingsearches/daily?geo=US The data would include search keywords, their ranks, search frequencies on a daily basis.

I tried scraping with R using rvest library at the beginning but extracting command scraped empty data. I guess html structure of the website is too complex than rvest is based? So I'd like to learn better way that can apply to the website.

I searched for some information specific to the daily search scraping but couldn't find as most posts were concerned with extracting Google Trend data rather than the daily search.

What would be an effective way to extract data from the website, or more general, this kind of website? I'm happy to learn any other tools than R and have basic knowledge in Python and Javascript. If anyone could give me a hint, then I will dig into it but at the moment I have no idea even where to start.

Thanks,

Upvotes: 1

Views: 4150

Answers (1)

WayToDoor
WayToDoor

Reputation: 1750

Have a look at the HTML using the 'inpect element' tool in firefox.

Essentially, we can see that every element you want to scrape from the webpage can be distinguished easily based on the tooltip :

The tooltip used

Given that, we can use selenium to scrape the webpage to retrive this information.

(Install it first with pip3 install -U selenium and install your favorite webdiver from the links here)

Start a browser and direct it to the google trends page using something similar to

╰─ ipython3
Python 3.7.0 (default, Jun 29 2018, 20:13:13)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from selenium import webdriver

In [2]: browser = webdriver.Firefox()
   ...: browser.get('https://trends.google.com/trends/trendingsearches/daily?geo=US')

You should now see something similar to this :

What you see on starting selenium webdriver

Again, using the inspect element tool, get the class of the div that contain every element to scrape :

What to scrape

We need to find the div with a class named feed-list-wrapper.

In [3]: list_div = browser.find_element_by_class_name("feed-list-wrapper")

In [4]: list_div
Out[4]: <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="b889702e-7e2b-7448-9180-c9fb3d1ff641", element="cad96530-3444-9d4f-a8e8-b7da780f5751")>

Once done, just get the list of the div details :

In [5]: details_divs = list_div.find_elements_by_class_name("details")

And, for example, get the title (you should understand the code by now)

In [6]: for detail_div in details_divs:
    ...:     print(detail_div.find_element_by_class_name("details-top").find_element_by_xpath("div/span/a").text)
    ...:
Captain Marvel
Celia Barquin Arozamena
Yom Kippur
Lethal White
National Cheeseburger Day 2018
Ind vs HK
Mario Kart
Barcelona
Emilia Clarke
Elementary
Angela Bassett
Lenny Kravitz
Lil Uzi Vert
Handmaid's Tale
Mary Poppins Returns trailer
Hannah Gadsby

Another example, to get the view count :

In [7]: for detail_div in details_divs:
    ...:     title = detail_div.find_element_by_class_name("details-top").find_element_by_xpath("div/span/a").text
    ...:     search_count = detail_div.find_element_by_xpath('..').find_element_by_class_name("search-count-title").text
    ...:     print("Title : {title} \t\t\t Searchs : {search_count}".format(title=title, search_count=search_count))
    ...:
Title : Captain Marvel           Searchs : 500 k+
Title : Celia Barquin Arozamena              Searchs : 200 k+
Title : Yom Kippur           Searchs : 100 k+
Title : Lethal White             Searchs : 50 k+
Title : National Cheeseburger Day 2018           Searchs : 50 k+
Title : Ind vs HK            Searchs : 50 k+
Title : Mario Kart           Searchs : 50 k+
Title : Barcelona            Searchs : 50 k+
Title : Emilia Clarke            Searchs : 50 k+
Title : Elementary           Searchs : 20 k+
Title : Angela Bassett           Searchs : 20 k+
Title : Lenny Kravitz            Searchs : 20 k+
Title : Lil Uzi Vert             Searchs : 20 k+
Title : Handmaid's Tale              Searchs : 20 k+
Title : Mary Poppins Returns trailer             Searchs : 20 k+
Title : Hannah Gadsby            Searchs : 20 k+

You should get used to selenium quickly. If you have any doubt on the methos used here, here is a link to the selenium docs

Upvotes: 7

Related Questions