Scraping html nodes for values within the node?

Question

I'm doing some practice scraping of this page: https://store.steampowered.com/app/261570

I'm looking to pull the review_summary_num_positive_reviews and review_summary_num_reviews values and store them in separate objects. I'm feeling like I'm close, but the documentation doesn't seem to work for this example case.

My code so far looks like:

library('rvest')
i = 387290 
url <- sprintf("https://store.steampowered.com/app/%i", i)
webpage <- read_html(url)

If I try:

html_nodes(webpage, css = "div.review_ctn input")

I get a list:

[1] 
[2] 
[3] 
[4] 
[5] 
[6] 
...

Rows 5 and 6 are what I'm after, but I feel like I'm making things more complicated by pulling elements 5 and 6, then un-listing.

Is there a more direct way of getting the 15176 and 15767 values from the html_nodes() function in one line?

I've tried things like css = "div.review_ctn input.value" but I'm not getting any results. I think I'm trying to use it for when the value is between the tag brackets instead of being embedded within the node itself.

Any thoughts?

Ronak Shah · Accepted Answer

Yes, you can get them based on id and then get the "value" parameter using html_attr

library(rvest)
i = 387290 
url <- sprintf("https://store.steampowered.com/app/%i", i)

webpage <- read_html(url)

webpage %>%
   html_nodes("div.review_ctn #review_summary_num_positive_reviews") %>%
   html_attr("value") %>%
   as.numeric()

#[1] 15186

webpage %>%
   html_nodes("div.review_ctn #review_summary_num_reviews") %>%
   html_attr("value") %>%
   as.numeric()

#[1] 15778

Scraping html nodes for values within the node?

Answers (1)

Related Questions