Reputation: 11
I'm starting scraping some websites from argentina. I want to scrape this particular websites : "https://www.disco.com.ar/prod/88953/aderezo-mayonesa-natura-237-gr" or "https://www.disco.com.ar/prod/416680/cerveza-rubia-brahma-chopp-1-l-botella-retornable"
I use the package "rvest" for recopile prices and names of other websites. I'm trying to get the URL using the next code:
library (rvest)
url_1 <- "https://www.disco.com.ar/prod/88953/aderezo-mayonesa-natura-237-gr"
page <- read_html (url_1)
I want to scratch the entire page, with the price and the name of those particular products. My problem is that rvest only takes the first window before someone clicks on the location question that appears in chrome. Once you click on "allow" or "not allow", chrome lets me access all the html information. I attach the reference photos, I want to access the product and I can only access the first window with the logo.
How can I make the information accessible via get_html? Do I have to use beautifulsoup or something?
Any help is more than welcome and I thank the entire community.
Upvotes: 0
Views: 98
Reputation: 45432
You need to make a call to :
POST /Geolocalizacion/Geolocalizacion.aspx/GuardarLocalizacion
and save the cookies to your html_session
. The product information is located in JSON in an input
tag with name hfProductData
under the value
attribute :
library(rvest)
library(httr)
library(jsonlite)
r <- POST("https://www.disco.com.ar/Geolocalizacion/Geolocalizacion.aspx/GuardarLocalizacion",
content_type("application/json"),
body = toJSON(
list(
latitud = NA,
longitud = NA,
noLocalizar = TRUE
), auto_unbox = TRUE
),encode = "json")
cookieList <- cookies(r)
cookies <- cookieList$value %>% setNames(cookieList$name)
url <- "https://www.disco.com.ar/prod/88953/aderezo-mayonesa-natura-237-gr"
resp <- html_session(url, set_cookies(cookies)) %>%
html_nodes('input[name="hfProductData"]') %>%
html_attr("value")
print(fromJSON(resp))
Output :
$DescripcionArticulo
[1] "Aderezo Mayonesa Natura 237 Gr"
$Grupo_Marca
[1] "NATURA"
$IdArchivoZoom
[1] ""
$IdArchivoBig
[1] "444812.jpg"
$IdArchivoSmall
[1] "444664.jpg"
$IdArticulo
[1] 88953
$Precio
[1] "49.52"
$unidadPedida
[1] "Un"
$Pesable
[1] "False"
$Stock
[1] "84.00"
$CucardaOferta
[1] ""
$Descuentos
list()
$ImgMxM
[1] "11510117005.jpg"
$Codigo
[1] "11510117005"
$Categoria
[1] "Almacén->Aderezos->Mayonesas"
Upvotes: 0