PK1998
PK1998

Reputation: 169

Scraping data from JavaScript graph to R

I am trying to automatize getting data from Figure 1: Electricity consumption relative to 2019 in this article. I have no issues with scraping from a normal page, but this chart is done in JS and I have no clue how to proceed or where to find the data that the chart uses.

Upvotes: 0

Views: 301

Answers (2)

Allan Cameron
Allan Cameron

Reputation: 173803

This is a relatively tricky scraping job. The data you are looking for is in the page linked by Eric Truett. It is in the format of a JSON string which is buried in the text of a Javascript call. The steps you need are therefore:

  1. Identify the page that actually contains the data (as already done by @EricTruett)
  2. Get the html page as a text string
  3. Strip out the part of the string you want
  4. Parse the JSON
  5. Get the element of the resulting list that contains the data you want
  6. Convert that element into the format you want

I can show you how to do steps 2, 3, and 4, but steps 5 and 6 depend on what exactly you need your output to be, which you haven't specified in your question. I have just guessed here:

# Step 1: Get the correct url (usually done via developer tools in a browser)
uri <- "https://e.infogram.com/8dc2a0f6-6c05-4e0a-91c3-4122c56989d9?src=embed"

# Step 2: Read the html into memory as a single text string:
page <- paste(readLines(uri), collapse = "\n")

# Step 3: Strip out the JSON you need. This can only really be done by scanning the
#         html for the data you want and finding unique delimiters at either end,
#         carving out the string with regexes and tidying up either end if needed.
page <- strsplit(page, "\"data\":\\[{3}", useBytes = TRUE)[[1]][2]
json <- paste0("[[[", strsplit(page, "]]]", useBytes = TRUE)[[1]][1], "]]]")

# Step 4: Parse the JSON. Use an existing library such as jsonlite for this
map_data <- jsonlite::fromJSON(json)

# Step 5: Find the element(s) you want in the resulting data structure. Here, the
#         result is a list with several elements, and from visual inspection, element
#         9 appears to be a nice tabular array containing useful data
useful_array <- map_data[[9]]

# Step 6: Arrange the result however you like. Here, I have just selected out some
#         useful columns and converted to a tibble for pretty printing:
df <- dplyr::as_tibble(map_data[[9]][,c(1, 2, 6)])
df <- setNames(df[which(df$V2 != ""), ], c("Country", "Percent", "Change"))

And the result looks like this:

df
#> # A tibble: 28 x 3
#>    Country  Percent Change 
#>    <chr>    <chr>   <chr>  
#>  1 Austria  90.44%  -10.00%
#>  2 Belgium  85.39%  -15.00%
#>  3 Bulgaria 94.54%  -5.00% 
#>  4 Croatia  87.16%  -13.00%
#>  5 Denmark  98.82%  -1.00% 
#>  6 Estonia  97.20%  -3.00% 
#>  7 Finland  92.40%  -8.00% 
#>  8 France   85.87%  -14.00%
#>  9 Germany  91.86%  -8.00% 
#> 10 Greece   108.48% 8.00%  
#> # ... with 18 more rows

You will probably need to dig around in map_data to get the actual data you need.

Upvotes: 2

Eric Truett
Eric Truett

Reputation: 3010

The main page embeds graphics from https://e.infogram.com/8dc2a0f6-6c05-4e0a-91c3-4122c56989d9?src=embed. If you look at the source of the embedded page, the data is in a javascript variable window.infographicData.

Upvotes: 2

Related Questions