rvest html_nodes returns {xml_nodeset (0)}

Question

I've been trying to scrape this page using rvest and selectorGadge. I am able to scrape the product description, but when I try to get the values as shown in the picture:

Text

However, when I run the code:

library(dplyr)
library(rvest)

read_html("https://www.dicasanet.com.br/material-de-construcao") %>%
  html_nodes(".product-payment")

I keep getting the result "{xml_nodeset (0)}". I noticed that, unlike other values (like the name of the product), this is not a div.a, but a div.div instead. Is there another way to get these values? How should I proceed? Thanks in advance!

QHarr · Accepted Answer

Data is dynamically loaded from a JavaScript object within a script tag. You can regex that out from the response text, parse with jsonlite into a json object and then extract what you want for the products

library(magrittr)
library(rvest)
library(stringr)
library(jsonlite)

page <- read_html('https://www.dicasanet.com.br/loja/catalogo.php?loja=790930&categoria=1')

data <- page %>% 
  toString() %>% 
  stringr::str_match('dataLayer = ($$.*$$)') %>% 
  .[2] %>% 
  jsonlite::parse_json()

print(data[[1]]$listProducts)

rvest html_nodes returns {xml_nodeset (0)}

Answers (2)

Related Questions