Web-Scraping with R. Extracting rating's marks from web-page

Question

I wnat to extract the information about rating of room(Accuracy,Communication,Cleanliness,Location, Check In,Value).

url <- "https://www.airbnb.com/rooms/8400275"
con <- file (url)
raw <- readLines (con)
close (con)

and now I need a string, that will help me to extract the information. In source code I found such line:

data-reactid=".1tzzodvxlvk.1.0.0.0.0.0.3.0.0.1.0">
class="star-rating" content="4.5"

As I understood, this is the code of "Accuracy" rating of the room. I want to extract "The content = '4.5'" and the name of estimation "Accuracy". How can I do that? The problem is, that in source code there are a lot of such "Content = " and "Accuracy" strings.

Tomas H · Accepted Answer

For this particular page you could use this approach . But the code is not very robust and success for other pages depends if the structure is the same

library(RCurl)
library(XML)

url<-"https://www.airbnb.com/rooms/8400275"
url2<-getURL(url)
parsed<-htmlParse(url2,encoding="UTF-8")
xpathSApply(parsed,"//div[@class='col-lg-6']//strong",xmlValue)[1]
xpathSApply(parsed,"//div[@class='star-rating-wrapper']//div[@class='star-rating']",xmlGetAttr,"content")[3]

Web-Scraping with R. Extracting rating's marks from web-page

Answers (1)

Related Questions

Web-Scraping with R. Extracting rating&#39;s marks from web-page

Answers (1)

Related Questions

Web-Scraping with R. Extracting rating's marks from web-page