Reputation: 73
I'm trying to import some data from an API that has an unusual nested JSON structure. Here's a snippet of what the JSON data looks like:
{"series":[{"series_id":"NG.N9010US2.M","name":"U.S. Natural Gas Gross Withdrawals, Monthly","units":"Million Cubic Feet","f":"M","unitsshort":"MMcf","description":"U.S. Natural Gas Gross Withdrawals","copyright":"None","source":"EIA, U.S. Energy Information Administration","iso3166":"USA","geography":"USA","start":"197301","end":"202112","updated":"2022-02-28T17:53:37-0500","data":[["202112",3683216],["202111",3551244],["202110",3595255],["202109",3412770],["202108",3530753],["202107",3490995],["202106",3391187],["202105",3510356],["202104",3408930],["202103",3482323],["202102",2923896],["202101",3506258],["202012",3489506],["202011",3352374],["202010",3363756],["202009",3265272],["202008",3349817],["202007",3373594],["202006",3217496],["202005",3285105],["202004",3374224],["202003",3582414],["202002",3363434],["202001",3596775],["201912",3622512],["201911",3509293],["201910",3551970],["201909",3396571],["201908",3448106],["201907",3395984],["201906",3299794],["201905",3424018],["201904",3315270],["201903",3383042],["201902",3056916],["201901",3376735],["201812",3391473],["201811",3259912],["201810",3314676],["201809",3153978],["201808",3206177],["201807",3138242],["201806",2972738],["201805",3098558],["201804",2976108],["201803",3069284],["201802",2751215],["201801",2993178],["201712",3025419],["201711",2897003],["201710",2911373],["201709",2763285],["201708",2769670],["201707",2753997],["201706",2688422],["201705",2778965],["201704",2689967],["201703",2797117],["201702",2493267],["201701",2723628]]}
I'm not very familiar with different flavors of JSON. As you can see, the data is nested within the series information, but it's not in the typical paired format. I have been able to import the data as a nested list using fromJSON in the jsonlite package, but I'm stuck on how to convert it into a data frame. I would like the series information to be maintained with a new row for each pair of data observations.
I haven't gotten very far with this, so my code so far is pretty simple. I was able to extract the data series as a list using the $ operator:
gas <- fromJSON("gas production.json", flatten=TRUE)
gas$series$data
Thanks!
Upvotes: 2
Views: 343
Reputation: 173793
From looking at your data structure, you could do something like this:
result <- jsonlite::fromJSON("gas_production.json")
df <- do.call(rbind, lapply(result$series[[1]]$data,
function(x) setNames(as.data.frame(x), c("date", "value"))))
df$date <- as.Date(paste(substr(df$date, 1, 4), substr(df$date, 5, 6), "01", sep = "-"))
Now the data frame contains two columns, date
and value
head(df)
date value
1 2021-12-01 3683216
2 2021-11-01 3551244
3 2021-10-01 3595255
4 2021-09-01 3412770
5 2021-08-01 3530753
6 2021-07-01 3490995
We could use this for example to plot the series:
library(ggplot2)
ggplot(df, aes(date, value)) +
geom_line(color = "deepskyblue4") +
labs(title = "Monthly U.S. Natural Gas Gross Withdrawals",
x = "Date", y = "Millions of cubic feet") +
theme_minimal()
EDIT
Here's a full reprex demonstrating the way to get all series in a single data frame:
url <- paste0("https://raw.githubusercontent.com/shrubberr/",
"energy-data/main/gas%20production.json")
result <- jsonlite::parse_json(httr::content(httr::GET(url), "text"))
df <- do.call(rbind, lapply(result$series, function(d) {
y <- do.call(rbind, lapply(d$data,
function(x) {
if(is.null(x[[2]])) x[[2]] <- NA_real_;
setNames(as.data.frame(x), c("Date", "Value"))}))
y$Name <- d$name
y
}))
df$Date <- as.Date(paste(substr(df$Date, 1, 4),
substr(df$Date, 5, 6), "01", sep = "-"))
library(ggplot2)
ggplot(df, aes(Date, Value, color = Name)) +
geom_line() +
labs(title = "Natural Gas Gross Withdrawals",
x = "Date", y = "Millions of cubic feet", color = "") +
theme_minimal() +
theme(legend.position = "top",
legend.direction = "vertical")
#> Warning: Removed 84 row(s) containing missing values (geom_path).
Created on 2022-03-30 by the reprex package (v2.0.1)
Upvotes: 2