Reputation: 169
I am trying to automatize getting data from Figure 1: Electricity consumption relative to 2019 in this article. I have no issues with scraping from a normal page, but this chart is done in JS and I have no clue how to proceed or where to find the data that the chart uses.
Upvotes: 0
Views: 301
Reputation: 173803
This is a relatively tricky scraping job. The data you are looking for is in the page linked by Eric Truett. It is in the format of a JSON string which is buried in the text of a Javascript call. The steps you need are therefore:
I can show you how to do steps 2, 3, and 4, but steps 5 and 6 depend on what exactly you need your output to be, which you haven't specified in your question. I have just guessed here:
# Step 1: Get the correct url (usually done via developer tools in a browser)
uri <- "https://e.infogram.com/8dc2a0f6-6c05-4e0a-91c3-4122c56989d9?src=embed"
# Step 2: Read the html into memory as a single text string:
page <- paste(readLines(uri), collapse = "\n")
# Step 3: Strip out the JSON you need. This can only really be done by scanning the
# html for the data you want and finding unique delimiters at either end,
# carving out the string with regexes and tidying up either end if needed.
page <- strsplit(page, "\"data\":\\[{3}", useBytes = TRUE)[[1]][2]
json <- paste0("[[[", strsplit(page, "]]]", useBytes = TRUE)[[1]][1], "]]]")
# Step 4: Parse the JSON. Use an existing library such as jsonlite for this
map_data <- jsonlite::fromJSON(json)
# Step 5: Find the element(s) you want in the resulting data structure. Here, the
# result is a list with several elements, and from visual inspection, element
# 9 appears to be a nice tabular array containing useful data
useful_array <- map_data[[9]]
# Step 6: Arrange the result however you like. Here, I have just selected out some
# useful columns and converted to a tibble for pretty printing:
df <- dplyr::as_tibble(map_data[[9]][,c(1, 2, 6)])
df <- setNames(df[which(df$V2 != ""), ], c("Country", "Percent", "Change"))
And the result looks like this:
df
#> # A tibble: 28 x 3
#> Country Percent Change
#> <chr> <chr> <chr>
#> 1 Austria 90.44% -10.00%
#> 2 Belgium 85.39% -15.00%
#> 3 Bulgaria 94.54% -5.00%
#> 4 Croatia 87.16% -13.00%
#> 5 Denmark 98.82% -1.00%
#> 6 Estonia 97.20% -3.00%
#> 7 Finland 92.40% -8.00%
#> 8 France 85.87% -14.00%
#> 9 Germany 91.86% -8.00%
#> 10 Greece 108.48% 8.00%
#> # ... with 18 more rows
You will probably need to dig around in map_data
to get the actual data you need.
Upvotes: 2
Reputation: 3010
The main page embeds graphics from https://e.infogram.com/8dc2a0f6-6c05-4e0a-91c3-4122c56989d9?src=embed. If you look at the source of the embedded page, the data is in a javascript variable window.infographicData
.
Upvotes: 2