finstats
finstats

Reputation: 1409

Pandas: reading multi-index JSON as pandas data frame

I have a JSON file as follows:

{
    "ALPHA": [
        {
            "date": "2021-06-22",
            "constituents": {
                "BBB": 0,
                "EEE": 1,
                "BTB": 1,
                "YUY": 1
            }
        },
        {
            "date": "2021-09-07",
            "constituents": {
                "BBB": 0,
                "EEE": 0,
                "BTB": 0,
                "YUY": 0
            }
        }
    ],
    "BETA": [
        {
            "date": "2021-06-22",
            "constituents": {
                "BBB": 1,
                "EEE": 1,
                "BTB": 1,
                "YUY": 1
            }
        },
        {
            "date": "2021-09-07",
            "constituents": {
                "BBB": 1,
                "EEE": 1,
                "BTB": 1,
                "YUY": 1
            }
        }
    ],

    "THETA": [
        {
            "date": "2021-06-22",
            "constituents": {
                "BBB": 0,
                "EEE": 1,
                "BTB": 1,
                "YUY": 0
            }
        },
        {
            "date": "2021-08-20",
            "constituents": {
                "BBB": 0,
                "EEE": 1,
                "BTB": 1,
                "YUY": 0
            }
        },
        {
            "date": "2021-09-07",
            "constituents": {
                "BBB": 0,
                "EEE": 1,
                "BTB": 1,
                "YUY": 0
            }
        }
    ]
}

I want to read the above into a pandas data frame where the first index is the date, the second index is the first keys (i.e. "ALPHA", "BETA", "THETA"), the columns are the inner keys (i.e. "BBB" ,"EEE", "BTB" ,"YUY"), and the cell values are the values of these inner keys.

How can I read that into pandas from the JSON file?

Upvotes: 5

Views: 593

Answers (2)

sammywemmy
sammywemmy

Reputation: 28699

I feel you get better performance, and potentially easier manipulation, if you deal with python native data structures outside Pandas, before pulling the final form into Pandas:

Let's flatten the nested dictionary into a single dictionary, using Python's tools:

container = []
for key, value in j.items(): # j is the main dictionary
    for entry in value:
        content = {'date': entry['date'], 
                   'key': key, 
                   # expand the nested constituent data
                   # this gets us a single dictionary 
                   **entry['constituents']}
        container.append(content)

print(container)
[{'date': '2021-06-22',
  'key': 'ALPHA',
  'BBB': 0,
  'EEE': 1,
  'BTB': 1,
  'YUY': 1},
 {'date': '2021-09-07',
  'key': 'ALPHA',
  'BBB': 0,
  'EEE': 0,
  'BTB': 0,
  'YUY': 0},
 {'date': '2021-06-22', 'key': 'BETA', 'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1},
 {'date': '2021-09-07', 'key': 'BETA', 'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1},
 {'date': '2021-06-22',
  'key': 'THETA',
  'BBB': 0,
  'EEE': 1,
  'BTB': 1,
  'YUY': 0},
 {'date': '2021-08-20',
  'key': 'THETA',
  'BBB': 0,
  'EEE': 1,
  'BTB': 1,
  'YUY': 0},
 {'date': '2021-09-07',
  'key': 'THETA',
  'BBB': 0,
  'EEE': 1,
  'BTB': 1,
  'YUY': 0}]

Now, build the dataframe, and set the required columns as index:

pd.DataFrame(container).set_index(['date', 'key'])

                  BBB  EEE  BTB  YUY
date       key
2021-06-22 ALPHA    0    1    1    1
2021-09-07 ALPHA    0    0    0    0
2021-06-22 BETA     1    1    1    1
2021-09-07 BETA     1    1    1    1
2021-06-22 THETA    0    1    1    0
2021-08-20 THETA    0    1    1    0
2021-09-07 THETA    0    1    1    0

Upvotes: 1

SeaBean
SeaBean

Reputation: 23217

You can use pd.Series to import the JSON to a Pandas series with ALPHA, BETA as index and elements as list. Then expand the list of JSON to individual JSON by .explode(). Expand the inner JSON to dataframe by .apply() + pd.Series.

Append date as index by .set_index() with append=True; swap date from second index to first index by .swaplevel().

Finally, take the column constituents and further expand the inner JSON to dataframe by .apply() + pd.Series, as follows:

(assume you have already loaded the JSON file into j)

df = (pd.Series(j)
        .explode()
        .apply(pd.Series)
        .set_index('date', append=True)
        .swaplevel()['constituents']
        .apply(pd.Series)
     )

Data input:

j = {'ALPHA': [{'date': '2021-06-22',
   'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 1}},
  {'date': '2021-09-07',
   'constituents': {'BBB': 0, 'EEE': 0, 'BTB': 0, 'YUY': 0}}],
 'BETA': [{'date': '2021-06-22',
   'constituents': {'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1}},
  {'date': '2021-09-07',
   'constituents': {'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1}}],
 'THETA': [{'date': '2021-06-22',
   'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 0}},
  {'date': '2021-08-20',
   'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 0}},
  {'date': '2021-09-07',
   'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 0}}]}

Output

print(df)


                  BBB  EEE  BTB  YUY
date                                
2021-06-22 ALPHA    0    1    1    1
2021-09-07 ALPHA    0    0    0    0
2021-06-22 BETA     1    1    1    1
2021-09-07 BETA     1    1    1    1
2021-06-22 THETA    0    1    1    0
2021-08-20 THETA    0    1    1    0
2021-09-07 THETA    0    1    1    0

Upvotes: 4

Related Questions