Reputation: 85
I'm trying to grab a table of data using read_html from the r package rvest.
I've tried the below code:
library(rvest)
raw <- read_html("https://demanda.ree.es/movil/peninsula/demanda/tablas/2016-01-02/2")
I don't believe the above pulled the data from the table, since I see 'raw' is a list of 2:
'node:<externalptr>' and 'doc:<externalptr>'
I've tried grabbing the xpath too:
html_nodes(raw,xpath = '//*[(@id = "tabla_generacion")]//*[contains(concat( " ", @class, " " ), concat( " ", "ng-scope", " " ))]')
Any advice on what to try next?
Thanks.
Upvotes: 2
Views: 2875
Reputation: 6659
This website is using angular to make a call to get the data. You can just use that call to get the raw JSON. The response is not pure JSON, so you can't just run fromJSON(url)
, you have to download the data and get rid of the non-JSON stuff before you parse it.
library(jsonlite)
library(httr)
url <- "https://demanda.ree.es/WSvisionaMovilesPeninsulaRest/resources/demandaGeneracionPeninsula?callback=angular.callbacks._2&curva=DEMANDA&fecha=2016-01-02"
a <- GET(url)
a <- content(a, as="text")
# get rid of the non-JSON stuff...
a <- gsub("^angular.callbacks._2\\(", "", a)
a <- gsub("\\);$", "", a)
df <- fromJSON(a, simplifyDataFrame = TRUE)
I found this by pushing F12 in Chrome and looking at the "Sources" tab. The data to fill the table had to come from somewhere... so it's just a matter of figuring out where. I was unable to use rvest to scrape the table. I'm not sure if that call to get the data was executed in R as it was in chrome... so there may have been no data for rvest to scrape.
Upvotes: 5