JHall651
JHall651

Reputation: 437

How scrape text from webpage that requires interaction in r

I am trying to scrape reviews from a webpage to determine word frequency. However, only partial reviews are given when the review is longer. You have to click on "More" to get the webpage to show the full review. Here is the code I am using to extract the text of the review. How can I "click" on more to get the full review?

library(rvest)

tripAdvisorURL <- "https://www.tripadvisor.com/Hotel_Review-g33657-d85704- 
Reviews-Hotel_Bristol-Steamboat_Springs_Colorado.html#REVIEWS"

webpage <-read_html(tripAdvisorURL)

reviewData <- xml_nodes(webpage,xpath = '//*[contains(concat( " ", @class, " 
" ), concat( " ", "partial_entry", " " ))]')

head(reviewData)

xml_text(reviewData[[1]])

[1] "The rooms were clean and we slept so good we had room 10 and 12 we 
didn’t use 12 but it joins 10 .kind of strange but loved the hotel ..me 
personally I would take the hot tub out it was kinda old..the lady 
that...More"

Upvotes: 0

Views: 393

Answers (1)

Yifu Yan
Yifu Yan

Reputation: 6106

As mentioned in the comment, you can use Rselenium together with rvest for more interactivity:

library(RSelenium)

rmDr <- rsDriver(browser = "chrome")

myclient <- rmDr$client
tripAdvisorURL <- "https://www.tripadvisor.com/Hotel_Review-g33657-d85704-Reviews-Hotel_Bristol-Steamboat_Springs_Colorado.html#REVIEWS"
myclient$navigate(tripAdvisorURL)
#select all "more" button, and loop to click them
webEles <- myclient$findElements(using = "css",value = ".ulBlueLinks")
for (webEle in webEles) {
    webEle$clickElement()
}

mypagesource <- myclient$getPageSource()

read_html(mypagesource[[1]]) %>%
    html_nodes(".partial_entry") %>%
    html_text()

Upvotes: 1

Related Questions