Reputation: 39
I have to complete an assignment to get the rating of a movie from imdb.com. I am a beginner in R, please forgive my ignorance. I came up with the solution below that works but I would like to learn from the best (you) if there is a more efficient way to do this. I find that I have issues with identifying the nodes. It looks to me that the node I used is too much. Could you please help?
pagetoread <- read_html("https://www.imdb.com/title/tt1877830/?ref_=fn_al_tt_1")
get_rating <- function(html){
html %>%
html_nodes('#__next > main > div > section.ipc-page-background.ipc-page-background--
base.sc-c7f03a63-0.kUbSjY >
section > div:nth-child(4) > section > section > div.sc-94726ce4-0.cMYixt >
div.sc-db8c1937-0.eGmDjE.sc-94726ce4-4.dyFVGl > div > div:nth-child(1) > a >
div > div > div.sc-7ab21ed2-0.fAePGh > div.sc-7ab21ed2-2.kYEdvH > span.sc-
7ab21ed2-1.jGRxWM') %>%
html_text() %>%
gsub("^\\s+|\\s+$", "", .)
}
get_rating(pagetoread)
Upvotes: 0
Views: 71
Reputation: 84465
Your mileage over time may vary as I only checked a few titles, however, currently you can use the below attribute = value selector with child > combinator, to specify child span
of element with attribute data-testid
whose value is hero-rating-bar__aggregate-rating__score
. This avoids the dynamic classes so provides some measure of robustness over time. It furthermore avoids using potentially fragile longer selector lists. CSS selector matching this way will be more performant than the equivalent xpath and the greater specificity of using the given list is advantageous as you are not actually styling anything, only matching.
library(rvest)
library(magrittr)
get_rating <- function(html) {
html %>%
html_element('[data-testid="hero-rating-bar__aggregate-rating__score"] > span:first-child') %>%
html_text() %>%
as.numeric()
}
pagetoread <- read_html("https://www.imdb.com/title/tt1877830/?ref_=fn_al_tt_1")
get_rating(pagetoread)
Upvotes: 1