Reputation: 25
I want to scape all the titles fo the result from google search.
For example, if I google 'asus', then I want to scrape all the title of the first page.
My problem is my result is empty.
The code is as below:
url = 'https://www.google.com/search?q=asus'
first_page <- read_html(url)
title = html_nodes(first_page,'h3.LC20lb.DKV0Md') %>% html_text()
The reason why I use 'h3.LC20lb.DKV0Md' because I inspect the source code like below figure enter image description here
Upvotes: 2
Views: 4545
Reputation: 173793
The problem is that the class names on Google searches are not constant, so you need to use tag names instead of class names. I find it easier with xpath rather than css selectors:
library(tidyverse)
library(rvest)
url = 'https://www.google.com/search?q=asus'
first_page <- read_html(url)
titles <- html_nodes(first_page, xpath = "//div/div/div/a/div[not(div)]") %>%
html_text()
titles <- titles[titles != ">"]
titles <- titles[titles != "View all"]
titles <- titles[nzchar(titles)]
df <- tibble(title = titles[1:(length(titles)/2) * 2 - 1],
url = titles[1:(length(titles)/2) * 2])
df
#> # A tibble: 7 x 2
#> title url
#> <chr> <chr>
#> 1 ASUS United Kingdom https://www.asus.com › ...
#> 2 Asus - Wikipedia https://en.wikipedia.org › wiki › Asus
#> 3 Asus Store: Computers & Accessories~ https://www.amazon.co.uk › Asus-Computer~
#> 4 ASUS - Amazon.co.uk https://www.amazon.co.uk › stores › ASUS~
#> 5 ASUS RMA https://rma.asus-europe.eu
#> 6 ASUS Subreddit https://www.reddit.com › ASUS
#> 7 ASUS Deals | Laptops Direct https://www.laptopsdirect.co.uk › asus
Created on 2020-03-02 by the reprex package (v0.3.0)
Upvotes: 4