Markus Knopfler
Markus Knopfler

Reputation: 637

Web scraping with R with scroll down

I am looking to download the first two columns ("GAS DAY STARTED ON" and "GAS IN STORAGE) from the following URL:

https://agsi.gie.eu/#/historical/eu

The default period is set to "LAST MONTH" and I would need "ALL".

Is someone able to tell me what package I can use to achieve this type of task? There is also a free API, but I did not manage to get far with that either.

Grateful for every input! Many thanks in advance!

Upvotes: 0

Views: 365

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78792

Let's try to steer you closer to the API path. If you have an API key you can (but you shouldn't) pass it directly to the following function. You should put it in your ~/.Renviron as:

AGSI_KEY=thekeytheygaveyou

and restart your R session. It will then be used automagically.

The following function takes start/end dates

get_agsi_data <- function(start, end, agsi_api_key = Sys.getenv("AGSI_KEY")) {

  start[1] <- as.character(as.Date(start[1]))
  end[1] <- as.character(as.Date(end)[1])

  httr::GET(
    url = "https://agsi.gie.eu/api/data/eu", # NOTE THE HARDCODING FOR eu
    httr::add_headers(`x-key` = agsi_api_key),
    httr::user_agent("[email protected]") # REPLACE THIS WITH YOUR EMAIL ADDRESS
  ) -> res

  httr::stop_for_status(res) # warns when API issues

  out <- httr::content(res, as = "text", encoding = "UTF-8")

  out <- jsonlite::fromJSON(out)

  sapply(out$info, function(x) { # the info element is an ugly list so we need to make it better
    if (length(x)) {
      x <- paste0(x, collapse = "; ") 
    } else {
      NA_character_
    }
  }) -> info

  out$info <- info

  readr::type_convert(
    df = out,
    col_types = cols(
      status = col_character(),
      gasDayStartedOn = col_date(format = ""),
      gasInStorage = col_double(),
      full = col_double(),
      trend = col_double(),
      injection = col_double(),
      withdrawal = col_double(),
      workingGasVolume = col_double(),
      injectionCapacity = col_double(),
      withdrawalCapacity = col_double()
    )
  ) -> out

  class(out) <- c("tbl_df", "tbl", "data.frame")

  out

}

xdf <- get_agsi_data("2018-06-01", "2018-10-01")

xdf
## # A tibble: 2,880 x 11
##    status gasDayStartedOn gasInStorage  full trend injection withdrawal workingGasVolume injectionCapacity
##  * <chr>  <date>                 <dbl> <dbl> <dbl>     <dbl>      <dbl>            <dbl>             <dbl>
##  1 E      2018-11-19              918.  86.1 -0.41      343.      4762.            1067.            11469.
##  2 E      2018-11-18              923.  86.5 -0.22      534.      2841.            1067.            11469.
##  3 E      2018-11-17              925.  86.7 -0.2       649.      2796.            1067.            11469.
##  4 E      2018-11-16              927.  86.9 -0.24      492.      3014.            1067.            11469.
##  5 E      2018-11-15              930.  87.1 -0.16      503.      2210.            1067.            11469.
##  6 E      2018-11-14              931.  87.3 -0.1       605.      1682.            1067.            11469.
##  7 E      2018-11-13              933.  87.4 -0.07      651.      1438.            1067.            11469.
##  8 E      2018-11-12              933.  87.5 -0.05      833.      1391.            1067.            11468.
##  9 E      2018-11-11              934.  87.5  0.09     1607.       659.            1067.            11478.
## 10 E      2018-11-10              933.  87.4  0.06     1458.       796.            1067.            11478.
## # ... with 2,870 more rows, and 2 more variables: withdrawalCapacity <dbl>, info <chr>

eu is hardcoded but it shld be straightforward to augment for other API endpoints:

enter image description here

Upvotes: 2

Related Questions