inight974
inight974

Reputation: 39

Get rating for a movie

I have to complete an assignment to get the rating of a movie from imdb.com. I am a beginner in R, please forgive my ignorance. I came up with the solution below that works but I would like to learn from the best (you) if there is a more efficient way to do this. I find that I have issues with identifying the nodes. It looks to me that the node I used is too much. Could you please help?

pagetoread <- read_html("https://www.imdb.com/title/tt1877830/?ref_=fn_al_tt_1")
get_rating <- function(html){
  html %>% 
    html_nodes('#__next > main > div > section.ipc-page-background.ipc-page-background-- 
    base.sc-c7f03a63-0.kUbSjY > 
           section > div:nth-child(4) > section > section > div.sc-94726ce4-0.cMYixt > 
           div.sc-db8c1937-0.eGmDjE.sc-94726ce4-4.dyFVGl > div > div:nth-child(1) > a > 
           div > div > div.sc-7ab21ed2-0.fAePGh > div.sc-7ab21ed2-2.kYEdvH > span.sc- 
    7ab21ed2-1.jGRxWM') %>% 
    html_text() %>%
    gsub("^\\s+|\\s+$", "", .) 
}
get_rating(pagetoread)

Upvotes: 0

Views: 71

Answers (1)

QHarr
QHarr

Reputation: 84465

Your mileage over time may vary as I only checked a few titles, however, currently you can use the below attribute = value selector with child > combinator, to specify child span of element with attribute data-testid whose value is hero-rating-bar__aggregate-rating__score. This avoids the dynamic classes so provides some measure of robustness over time. It furthermore avoids using potentially fragile longer selector lists. CSS selector matching this way will be more performant than the equivalent xpath and the greater specificity of using the given list is advantageous as you are not actually styling anything, only matching.

library(rvest)
library(magrittr)

get_rating <- function(html) {
  html %>%
    html_element('[data-testid="hero-rating-bar__aggregate-rating__score"] > span:first-child') %>%
    html_text() %>%
    as.numeric()
}

pagetoread <- read_html("https://www.imdb.com/title/tt1877830/?ref_=fn_al_tt_1")

get_rating(pagetoread)

Upvotes: 1

Related Questions