Reputation: 35
I'm an absolute R beginner and I've been trying to scrape shoe prices from this Sprinter Sports page, with the ultimate goal of having a dataset that will automatically load, on a daily basis, (i) original and (ii) discounted prices for shoes I'm interested in.
The problem is that, of the 24 shoes currently for sale, only 16 have both an "original" and "discounted" price. The remaining 8 don't have a "discounted" price as they are not being sold at a discount. Since the "original" column has 24 observations, and the "discounted" column only has 16, I can't join these together in a dataset.
How can I load shoes without a discount such that their "discounted" column is set to NA? My code is below. Thanks!
date_today = substring(gsub("-", "", Sys.Date()),3)
page_sp_merrel <- read_html("https://www.sprintersports.com/pt/sapatilhas-merrell-homem?page=1&per_page=50")
price_old_sp_merrel <- page_sp_merrel %>%
html_nodes(".product-card__info-price-old") %>%
html_text()
price_new_sp_merrel <- page_sp_merrel %>%
html_nodes(".product-card__info-price-actual") %>%
html_text()
product_name_sp_merrel <- page_sp_merrel %>%
html_nodes(".col-md-3 .product-card__info-name") %>%
html_text()
sp_merrel_df <- tibble(
price_old = price_old_sp_merrel,
price_new = price_new_sp_merrel,
product_name = product_name_sp_merrel,
date = date_today
)
Upvotes: 2
Views: 62
Reputation: 124213
This could be achieved like so. Basically my approach differs from yours in that I loop over the cards and extract the desired information directly into a dataframe which automatically gives an NA
if an element is not present on a card:
library(rvest)
date_today = substring(gsub("-", "", Sys.Date()),3)
page_sp_merrel <- read_html("https://www.sprintersports.com/pt/sapatilhas-merrell-homem?page=1&per_page=50")
sp_merrel_df <- page_sp_merrel %>%
html_nodes(".product-card__info-data") %>%
purrr::map_df(function(x) {
data.frame(
product_name = html_node(x, ".product-card__info-name") %>% html_text(),
price_old = html_node(x, ".product-card__info-price-old") %>% html_text(),
price_new = html_node(x, ".product-card__info-price-actual") %>% html_text(),
date = date_today
)
})
head(sp_merrel_df)
#> product_name price_old price_new date
#> 1 Merrell Riverbed 3 69,99 € 59,99 € 210719
#> 2 Sapatilhas Montanha Merrell <NA> 114,99 € 210719
#> 3 Merrell Moab Adventure <NA> 99,99 € 210719
#> 4 Merrel Moab 2 Vent 99,99 € 79,99 € 210719
#> 5 Merrell Alverstone <NA> 79,99 € 210719
#> 6 Merrell Chameleon <NA> 129,99 € 210719
Upvotes: 1