AI52487963
AI52487963

Reputation: 1273

Scraping html nodes for values within the node?

I'm doing some practice scraping of this page: https://store.steampowered.com/app/261570

I'm looking to pull the review_summary_num_positive_reviews and review_summary_num_reviews values and store them in separate objects. I'm feeling like I'm close, but the documentation doesn't seem to work for this example case.

My code so far looks like:

library('rvest')
i = 387290 
url <- sprintf("https://store.steampowered.com/app/%i", i)
webpage <- read_html(url)

If I try:

html_nodes(webpage, css = "div.review_ctn input")

I get a list:

[1] <input type="hidden" id="review_appid" value="387290">
[2] <input type="hidden" id="review_default_day_range" value="30">
[3] <input type="hidden" id="review_start_date" value="-1">
[4] <input type="hidden" id="review_end_date" value="-1">
[5] <input type="hidden" id="review_summary_num_positive_reviews" value="15176">
[6] <input type="hidden" id="review_summary_num_reviews" value="15767">
...

Rows 5 and 6 are what I'm after, but I feel like I'm making things more complicated by pulling elements 5 and 6, then un-listing.

Is there a more direct way of getting the 15176 and 15767 values from the html_nodes() function in one line?

I've tried things like css = "div.review_ctn input.value" but I'm not getting any results. I think I'm trying to use it for when the value is between the tag brackets instead of being embedded within the node itself.

Any thoughts?

Upvotes: 1

Views: 54

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389105

Yes, you can get them based on id and then get the "value" parameter using html_attr

library(rvest)
i = 387290 
url <- sprintf("https://store.steampowered.com/app/%i", i)

webpage <- read_html(url)

webpage %>%
   html_nodes("div.review_ctn #review_summary_num_positive_reviews") %>%
   html_attr("value") %>%
   as.numeric()

#[1] 15186

webpage %>%
   html_nodes("div.review_ctn #review_summary_num_reviews") %>%
   html_attr("value") %>%
   as.numeric()

#[1] 15778

Upvotes: 2

Related Questions