Reputation: 1273
I'm doing some practice scraping of this page: https://store.steampowered.com/app/261570
I'm looking to pull the review_summary_num_positive_reviews
and review_summary_num_reviews
values and store them in separate objects. I'm feeling like I'm close, but the documentation doesn't seem to work for this example case.
My code so far looks like:
library('rvest')
i = 387290
url <- sprintf("https://store.steampowered.com/app/%i", i)
webpage <- read_html(url)
If I try:
html_nodes(webpage, css = "div.review_ctn input")
I get a list:
[1] <input type="hidden" id="review_appid" value="387290">
[2] <input type="hidden" id="review_default_day_range" value="30">
[3] <input type="hidden" id="review_start_date" value="-1">
[4] <input type="hidden" id="review_end_date" value="-1">
[5] <input type="hidden" id="review_summary_num_positive_reviews" value="15176">
[6] <input type="hidden" id="review_summary_num_reviews" value="15767">
...
Rows 5 and 6 are what I'm after, but I feel like I'm making things more complicated by pulling elements 5 and 6, then un-listing.
Is there a more direct way of getting the 15176
and 15767
values from the html_nodes()
function in one line?
I've tried things like css = "div.review_ctn input.value"
but I'm not getting any results. I think I'm trying to use it for when the value is between the tag brackets instead of being embedded within the node itself.
Any thoughts?
Upvotes: 1
Views: 54
Reputation: 389105
Yes, you can get them based on id
and then get the "value"
parameter using html_attr
library(rvest)
i = 387290
url <- sprintf("https://store.steampowered.com/app/%i", i)
webpage <- read_html(url)
webpage %>%
html_nodes("div.review_ctn #review_summary_num_positive_reviews") %>%
html_attr("value") %>%
as.numeric()
#[1] 15186
webpage %>%
html_nodes("div.review_ctn #review_summary_num_reviews") %>%
html_attr("value") %>%
as.numeric()
#[1] 15778
Upvotes: 2