Reputation: 241
I am trying to scrape all details (Type Of Traveller, Seat Type,Route,Date Flown, Seat Comfort, Cabin Staff Service, Food & Beverages, Inflight Entertainment,Ground Service,Wifi & Connectivity,Value For Money) inclusive of the star rating
from the airline quality webpage
https://www.airlinequality.com/airline-reviews/emirates/
Not Working as expected
my_url<- c("https://www.airlinequality.com/airline-reviews/emirates/")
review <- function(url){
review<- read_html(url) %>%
html_nodes(".review-value") %>%
html_text%>%
as_tibble()
}
output <- map_dfr(my_url, review )
Only able to scrape star rating , I need to have the all details (e.g Cabin Staff Service - rating 2 , Food & Beverages = rating 5)
star <- function(url){
stars_sq <- read_html(url) %>%
html_nodes(".star") %>%
html_attr("class") %>%
as.factor() %>%
as_tibble()
}
output_star<- map_dfr(my_url, star )
The output of the result should be in a table form :
column : Type Of Traveller , Seat Type,Route,Date Flown, Seat Comfort .... with the star rating
row : each reviews
Upvotes: 1
Views: 394
Reputation: 174278
It's a little involved because you need to tabulate the filled/unfilled stars to get the rating for each field. I would use html_table()
to help, then re-insert the calculated star values:
require(tibble)
require(purrr)
require(rvest)
my_url <- c("https://www.airlinequality.com/airline-reviews/emirates/")
count_stars_in_cell <- function(cell)
{
html_children(cell) %>%
html_attr("class") %>%
equals("star fill") %>%
which %>%
length
}
get_ratings_each_review <- function(review)
{
review %>%
html_nodes(".review-rating-stars") %>%
lapply(count_stars_in_cell) %>%
unlist
}
all_tables <- read_html(my_url) %>%
html_nodes("table")
reviews <- lapply(all_tables, html_table)
ratings <- lapply(all_tables, get_ratings_each_review)
for (i in seq_along(reviews))
{
reviews[[i]]$X2[reviews[[i]]$X2 == "12345"] <- ratings[[i]]
}
print(reviews)
This gives you a list with one table for each review. These should be straightforward to combine into a single data frame.
Upvotes: 4