jlp
jlp

Reputation: 175

Extract tidy data from complex nested JSON in R

Extracting tidy data from a simple JSON is trivial with the tidyjson package (https://cran.r-project.org/web/packages/tidyjson/vignettes/introduction-to-tidyjson.html)

I haven't been able to apply this logic to a complex nested JSON structure. Similar question such as this one (how do you extract data from nested json data in R) are too specific so I can't extrapolate it to other cases.

A more general case could be given by this structure (see working reproducible examples here: 1.4 Example requests: https://www.ree.es/en/apidatos)

{
        "data": {
          "type": "WIDGET TYPE",
          "id": "WIDGET_ID",
          "attributes": {
            "title": "WIDGET NAME",
            "last-update": "2019-02-01T08:26:34.000+01:00",
            "description": "WIDGET DESCRIPTION",
          },
          "meta": {
            "cache-control": {
              "cache": "HIT",
              "expireAt": "2019-03-01T17:18:22"
            }
          }
        },
        "included": [
          {
            "type": "INDICATOR_1 TYPE",
            "id": "INDICADOR_1_ID",
            "groupId": null,
            "attributes": {
              "title": "INDICADOR_1 NAME",
              "description": "INDICADOR_1 DESCRIPTION",
              "color": "#2fa688",
              "type": "INDICADOR_1 TYPE",
              "magnitude": "INDICADOR_1 MAGNITUDE",
              "composite": false,
              "last-update": "2019-02-19T08:26:34.000+01:00",
              "values": [
                {
                  "value": 12345,
                  "percentage": "VALUE BETWEEN 0 AND 1",
                  "datetime": "2019-02-04T20:44:00.000+01:00"
                }
              ]
            },
           {
            "type": "INDICATOR_2 TYPE",
            "id": "INDICADOR_1_ID",
            "groupId": null,
            "attributes": {
               …
            }
          }
        ]
       }
      }
  1. The first level has an object "data" and an array "included"

  2. The "included" array has one object for each indicator

  3. In each of these objects there is an "attributes" object with a "values" array where the final data are located: "value", "percentage" and "datetime"

The goal would be to extract the data to a tidy dataframe with the columns "type", "title", "value", "percentage" and "datetime"

Upvotes: 0

Views: 944

Answers (1)

jlp
jlp

Reputation: 175

I found out after a lot of trial and error. I answer myself in case it can help anyone else departing from the json object:

library(tidyjson)

json %>% # our json object
  enter_object(included) %>% # to enter the object where the data are
  gather_array() %>% spread_all() %>% # to work with the array
  select(attributes.title) %>% # to maintain this variable
  enter_object(values) %>% # to enter the array where the final data are
  gather_array() %>% spread_all() %>% # same as before to work with the array
  select(indicator = attributes.title, value, percentage, datetime) # select final data

Basically it's the same process enter_object %>% gather_array %>% spread_all %>% select repeated twice. You only have to name the objects you want to enter on each level and the pieces of information you want to select.

Upvotes: 1

Related Questions