Skyzf
Skyzf

Reputation: 1

Issues with Google maps scraping using R

I am trying to learn web scraping by scraping average google reviews and addresses on google maps. But I have a few questions that I hope to get help on:

  1. Missing reviews, making date difficult to match

This company has 6 outlets. But only 5 has reviews.

Google map

So when I run the following code to extract the name of the outlet and ratings, I get 6 names and 5 ratings data. So I am unable to match the name of the outlet to the ratings

Name result Rating result

The issue is because the outlet with no reviews does not have a class for the aria-label for my code to work.

Inspect element

Is it possible for me to extract the information in such a way that I can match the rating to the review?

pacman::p_load(rvest,
               xml2,
               RSelenium)

rmDr=rsDriver(browser = "firefox", port=4443L)$client
myclient= rmDr
myclient$navigate("https://www.google.com/maps/search/attea/@1.3586881,103.8165833,12z/data=!3m1!4b1")

html_obj <- myclient$getPageSource(header = TRUE)[[1]] %>% read_html()

names <- html_obj %>% html_elements(".hfpxzc")%>% 
  html_attr("aria-label") 

name_outlet<-as.data.frame(names)

stars_numrev <- html_obj %>% html_elements(".ZkP5Je")%>% 
  html_attr("aria-label") 

stars_numrev_outlet<-as.data.frame(stars_numrev)
  1. The second issue I have is that I am trying to extract the address as well, but I am again unable to extract the text without aria-label and class.

Element example

  1. Last question, how do I get the side bar to scroll to the bottom? I can't use rselenium to send key presses. Would I be able to simulate hovering the mouse of the side bar and simulate mouse scrolling?

Sorry for the long list of questions. I have been interested in scraping for a long time and I have been learning primarily from web sources. Any help or directions to useful guides will be deeply appreciated. Cheers!

Upvotes: 0

Views: 323

Answers (1)

Russ
Russ

Reputation: 1431

For your example, I was able to pull the address using this:

html_obj %>% 
  html_node("div.RcCsl:nth-child(3) > button:nth-child(1) > div:nth-child(1) > div:nth-child(3) > div:nth-child(1)") %>%
  html_text()
[1] "2500 Chestnut Ave, Glenview, IL 60026"

I tried it with a few different locations so it seems to be reasonably robust. The class name .Io6YTe appears to also be the same for different locations, so that would probably also work as an html node. You may need to add a step to wait for the page to load before pulling the html from the page.

I'm still working on question 3, I'll edit this answer if I figure anything out. What's the end goal for scrolling to the bottom? Maybe there's an alternative way to reach that goal?

Upvotes: 0

Related Questions