How to scrape the budget value of a movie from IMDB using rvest

Question

I have tried to scrape the gross and budget values from IMDB.com using the rvest package but I can't. My code is:

library(rvest)    
movie <- html("http://www.imdb.com/title/tt1490017/")   
movie %>% 
html_node("#budget .itemprop") %>%     
html_text() %>%      
as.numeric()

and I get

numeric(0)

mpalanco · Accepted Answer

Sam Firke provided a very neat solution. I just post mine to show a different alternative to extract the numeric value. As Sam Firke, I used the SelectorGadget. The html function seems to work fine. Instead of tidyr, which I didn't know it had that handy function, I used gsub:

library(rvest)    
movie <- html("http://www.imdb.com/title/tt1490017/") 
movie %>% 
  html_node(".txt-block:nth-child(11)") %>%
  html_text() %>% 
  gsub("\D", "", .) %>% 
  as.numeric()

Output:

[1] 6e+07

How to scrape the budget value of a movie from IMDB using rvest

Answers (2)

Related Questions