Reputation: 794
I am experimenting with rvest
to learn web scraping with R. I am trying to replicate the Lego example for a couple of other sections of the page and using selector gadget
to id.
I pulled the example from R Studio tutorial. With the code below, 1 and 2 work, but 3 does not.
library(rvest)
lego_movie <- html("http://www.imdb.com/title/tt1490017/")
# 1 - Get rating
lego_movie %>%
html_node("strong span") %>%
html_text() %>%
as.numeric()
# 2 - Grab actor names
lego_movie %>%
html_nodes("#titleCast .itemprop span") %>%
html_text()
# 3 - Get Meta Score
lego_movie %>%
html_node(".star-box-details a:nth-child(4)") %>%
html_text() %>%
as.numeric()
Upvotes: 3
Views: 1623
Reputation: 3194
You can see that before converting into numeric, it returns a " 83/100\n"
lego_movie %>%
html_node(".star-box-details a:nth-child(4)") %>%
html_text()
# [1] " 83/100\n"
You can use trim=TRUE
to omit \n
. You can't convert this to numeric because you have /
. :
lego_movie %>%
html_node(".star-box-details a:nth-child(4)") %>%
html_text(trim=TRUE)
# [1] "83/100"
If you convert this to numeric, you will get NA
with warnings which is not unexpected:
# [1] NA
# Warning message:
# In function_list[[k]](value) : NAs introduced by coercion
If you want the numeric 83
as the final answer, you can use regular expression tools like gsub
to remove 100
and \
(assuming that the full score is 100 for all movies).
lego_movie %>%
html_node(".star-box-details a:nth-child(4)") %>%
html_text(trim=TRUE) %>%
gsub("100|\\/","",.)%>%
as.numeric()
# [1] 83
Upvotes: 2
Reputation: 69151
I'm not really up to speed on all of the pipes and associated code, so there's probably some new fandangled tools to do this...but given that the answer above gets you to "83/100"
, you can do something like this:
as.numeric(unlist(strsplit("83/100", "/")))[1]
[1] 83
Which I guess would look something like this with the pipes:
lego_movie %>%
html_node(".star-box-details a:nth-child(4)") %>%
html_text(trim=TRUE) %>%
strsplit(., "/") %>%
unlist(.) %>%
as.numeric(.) %>%
head(., 1)
[1] 83
Or as Frank suggested, you could evaluate the expression "83/100"
with something like:
lego_movie %>%
html_node(".star-box-details a:nth-child(4)") %>%
html_text(trim=TRUE) %>%
parse(text = .) %>%
eval(.)
[1] 0.83
Upvotes: 4