eduardokapp
eduardokapp

Reputation: 1751

JSON to Long Table in Python

I have the following JSON defined as a dict in python:

specs = {
    "state/79900116649511": {
        "active": {
            "window_size": 10000,
            "batch": True,
            "n_col": 50,
            "n_row": 200
        },
        "voltan": {
            "window_size": 50
        },
        "cura": {
            "window_size": 100
        }
    },
    "state/79900216649511": {
        "active": {
            "window_size": 10000,
            "batch": True,
            "n_col": 50,
            "n_row": 200
        },
        "voltan": {
            "window_size": 50
        },
        "cura": {
            "window_size": 100
        }
    }
}

I want to create a long table from this dictionary. The output should be:

state variable window_size batch n_col n_row
79900116649511 active 10000 True 50 200
79900116649511 voltan 50 NULL NULL NULL
79900116649511 cura 100 NULL NULL NULL
79900216649511 active 10000 True 50 200
79900216649511 voltan 50 NULL NULL NULL
79900216649511 cura 100 NULL NULL NULL

How can I do that? I thought about using nested loops, but that does not seem very efficient.

Upvotes: 1

Views: 82

Answers (2)

wts
wts

Reputation: 96

This solution is either simple but works

import pandas as pd

df = pd.DataFrame.from_dict(specs, orient='index')
df = df.unstack(1).reset_index()
df = df.rename(columns={'level_0': 'variable', 'level_1': 'state'})
columns = df[0].iloc[0].keys()
for i in columns:
    df[i] = df[0].apply(lambda x: x.get(i, None))
    
df = df.drop(columns=0)
df.state = df.state.apply(lambda x: x.replace('state/',''))
df = df[['state', 'variable', 'window_size', 'batch', 'n_col', 'n_row']].sort_values(by=['state'])

Results:

            state variable  window_size batch  n_col  n_row
0  79900116649511   active        10000  True   50.0  200.0
1  79900116649511   voltan           50  None    NaN    NaN
2  79900116649511     cura          100  None    NaN    NaN
3  79900216649511   active        10000  True   50.0  200.0
4  79900216649511   voltan           50  None    NaN    NaN
5  79900216649511     cura          100  None    NaN    NaN

Upvotes: 2

dsalaj
dsalaj

Reputation: 3197

There is no simple solution as you use the first level key as "state" column, second level key as "variable" column, and only the rest as the column values that can be incomplete.

So you need at least two "loops" with special rules how to handle those keys.

Generally, to flatten/normalize a nested dict/json you can use the helper function json_normalize from the pandas library. For example if you only needed the top level keys from your json to be used as index, you could do the following:

import pandas as pd

dfs = []
for key in specs.keys():
    df = pd.json_normalize(specs[key])
    df.index = [key]
    dfs.append(df)
pd.concat(dfs)

Which results in:

enter image description here

Upvotes: 1

Related Questions