Reputation: 1
I am trying to learn web scraping by scraping average google reviews and addresses on google maps. But I have a few questions that I hope to get help on:
This company has 6 outlets. But only 5 has reviews.
So when I run the following code to extract the name of the outlet and ratings, I get 6 names and 5 ratings data. So I am unable to match the name of the outlet to the ratings
The issue is because the outlet with no reviews does not have a class for the aria-label for my code to work.
Is it possible for me to extract the information in such a way that I can match the rating to the review?
pacman::p_load(rvest,
xml2,
RSelenium)
rmDr=rsDriver(browser = "firefox", port=4443L)$client
myclient= rmDr
myclient$navigate("https://www.google.com/maps/search/attea/@1.3586881,103.8165833,12z/data=!3m1!4b1")
html_obj <- myclient$getPageSource(header = TRUE)[[1]] %>% read_html()
names <- html_obj %>% html_elements(".hfpxzc")%>%
html_attr("aria-label")
name_outlet<-as.data.frame(names)
stars_numrev <- html_obj %>% html_elements(".ZkP5Je")%>%
html_attr("aria-label")
stars_numrev_outlet<-as.data.frame(stars_numrev)
Sorry for the long list of questions. I have been interested in scraping for a long time and I have been learning primarily from web sources. Any help or directions to useful guides will be deeply appreciated. Cheers!
Upvotes: 0
Views: 323
Reputation: 1431
For your example, I was able to pull the address using this:
html_obj %>%
html_node("div.RcCsl:nth-child(3) > button:nth-child(1) > div:nth-child(1) > div:nth-child(3) > div:nth-child(1)") %>%
html_text()
[1] "2500 Chestnut Ave, Glenview, IL 60026"
I tried it with a few different locations so it seems to be reasonably robust. The class name .Io6YTe
appears to also be the same for different locations, so that would probably also work as an html node. You may need to add a step to wait for the page to load before pulling the html from the page.
I'm still working on question 3, I'll edit this answer if I figure anything out. What's the end goal for scrolling to the bottom? Maybe there's an alternative way to reach that goal?
Upvotes: 0