Reputation: 175
Extracting tidy data from a simple JSON is trivial with the tidyjson
package (https://cran.r-project.org/web/packages/tidyjson/vignettes/introduction-to-tidyjson.html)
I haven't been able to apply this logic to a complex nested JSON structure. Similar question such as this one (how do you extract data from nested json data in R) are too specific so I can't extrapolate it to other cases.
A more general case could be given by this structure (see working reproducible examples here: 1.4 Example requests: https://www.ree.es/en/apidatos)
{
"data": {
"type": "WIDGET TYPE",
"id": "WIDGET_ID",
"attributes": {
"title": "WIDGET NAME",
"last-update": "2019-02-01T08:26:34.000+01:00",
"description": "WIDGET DESCRIPTION",
},
"meta": {
"cache-control": {
"cache": "HIT",
"expireAt": "2019-03-01T17:18:22"
}
}
},
"included": [
{
"type": "INDICATOR_1 TYPE",
"id": "INDICADOR_1_ID",
"groupId": null,
"attributes": {
"title": "INDICADOR_1 NAME",
"description": "INDICADOR_1 DESCRIPTION",
"color": "#2fa688",
"type": "INDICADOR_1 TYPE",
"magnitude": "INDICADOR_1 MAGNITUDE",
"composite": false,
"last-update": "2019-02-19T08:26:34.000+01:00",
"values": [
{
"value": 12345,
"percentage": "VALUE BETWEEN 0 AND 1",
"datetime": "2019-02-04T20:44:00.000+01:00"
}
]
},
{
"type": "INDICATOR_2 TYPE",
"id": "INDICADOR_1_ID",
"groupId": null,
"attributes": {
…
}
}
]
}
}
The first level has an object "data"
and an array "included"
The "included"
array has one object for each indicator
In each of these objects there is an "attributes"
object with a "values"
array where the final data are located: "value"
, "percentage"
and "datetime"
The goal would be to extract the data to a tidy dataframe with the columns "type"
, "title"
, "value"
, "percentage"
and "datetime"
Upvotes: 0
Views: 944
Reputation: 175
I found out after a lot of trial and error. I answer myself in case it can help anyone else departing from the json
object:
library(tidyjson)
json %>% # our json object
enter_object(included) %>% # to enter the object where the data are
gather_array() %>% spread_all() %>% # to work with the array
select(attributes.title) %>% # to maintain this variable
enter_object(values) %>% # to enter the array where the final data are
gather_array() %>% spread_all() %>% # same as before to work with the array
select(indicator = attributes.title, value, percentage, datetime) # select final data
Basically it's the same process enter_object %>% gather_array %>% spread_all %>% select
repeated twice. You only have to name the objects you want to enter on each level and the pieces of information you want to select.
Upvotes: 1