Luke
Luke

Reputation: 25

Web Scraping Google Result with R

I want to scape all the titles fo the result from google search.

For example, if I google 'asus', then I want to scrape all the title of the first page.

My problem is my result is empty.

The code is as below:

url = 'https://www.google.com/search?q=asus'
first_page <- read_html(url)
title = html_nodes(first_page,'h3.LC20lb.DKV0Md') %>% html_text() 

The reason why I use 'h3.LC20lb.DKV0Md' because I inspect the source code like below figure enter image description here

Upvotes: 2

Views: 4545

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173793

The problem is that the class names on Google searches are not constant, so you need to use tag names instead of class names. I find it easier with xpath rather than css selectors:

library(tidyverse)
library(rvest)

url = 'https://www.google.com/search?q=asus'
first_page <- read_html(url)
titles <- html_nodes(first_page, xpath = "//div/div/div/a/div[not(div)]") %>% 
          html_text()
titles <- titles[titles != ">"]
titles <- titles[titles != "View all"]
titles <- titles[nzchar(titles)]

df <- tibble(title  = titles[1:(length(titles)/2) * 2 - 1],
             url    = titles[1:(length(titles)/2) * 2])
df
#> # A tibble: 7 x 2
#>   title                                url                                      
#>   <chr>                                <chr>                                    
#> 1 ASUS United Kingdom                  https://www.asus.com › ...               
#> 2 Asus - Wikipedia                     https://en.wikipedia.org › wiki › Asus   
#> 3 Asus Store: Computers & Accessories~ https://www.amazon.co.uk › Asus-Computer~
#> 4 ASUS - Amazon.co.uk                  https://www.amazon.co.uk › stores › ASUS~
#> 5 ASUS RMA                             https://rma.asus-europe.eu               
#> 6 ASUS Subreddit                       https://www.reddit.com › ASUS            
#> 7 ASUS Deals | Laptops Direct          https://www.laptopsdirect.co.uk › asus

Created on 2020-03-02 by the reprex package (v0.3.0)

Upvotes: 4

Related Questions