Reputation: 1751
I have the following JSON defined as a dict in python:
specs = {
"state/79900116649511": {
"active": {
"window_size": 10000,
"batch": True,
"n_col": 50,
"n_row": 200
},
"voltan": {
"window_size": 50
},
"cura": {
"window_size": 100
}
},
"state/79900216649511": {
"active": {
"window_size": 10000,
"batch": True,
"n_col": 50,
"n_row": 200
},
"voltan": {
"window_size": 50
},
"cura": {
"window_size": 100
}
}
}
I want to create a long table from this dictionary. The output should be:
state | variable | window_size | batch | n_col | n_row |
---|---|---|---|---|---|
79900116649511 | active | 10000 | True | 50 | 200 |
79900116649511 | voltan | 50 | NULL | NULL | NULL |
79900116649511 | cura | 100 | NULL | NULL | NULL |
79900216649511 | active | 10000 | True | 50 | 200 |
79900216649511 | voltan | 50 | NULL | NULL | NULL |
79900216649511 | cura | 100 | NULL | NULL | NULL |
How can I do that? I thought about using nested loops, but that does not seem very efficient.
Upvotes: 1
Views: 82
Reputation: 96
This solution is either simple but works
import pandas as pd
df = pd.DataFrame.from_dict(specs, orient='index')
df = df.unstack(1).reset_index()
df = df.rename(columns={'level_0': 'variable', 'level_1': 'state'})
columns = df[0].iloc[0].keys()
for i in columns:
df[i] = df[0].apply(lambda x: x.get(i, None))
df = df.drop(columns=0)
df.state = df.state.apply(lambda x: x.replace('state/',''))
df = df[['state', 'variable', 'window_size', 'batch', 'n_col', 'n_row']].sort_values(by=['state'])
Results:
state variable window_size batch n_col n_row
0 79900116649511 active 10000 True 50.0 200.0
1 79900116649511 voltan 50 None NaN NaN
2 79900116649511 cura 100 None NaN NaN
3 79900216649511 active 10000 True 50.0 200.0
4 79900216649511 voltan 50 None NaN NaN
5 79900216649511 cura 100 None NaN NaN
Upvotes: 2
Reputation: 3197
There is no simple solution as you use the first level key as "state" column, second level key as "variable" column, and only the rest as the column values that can be incomplete.
So you need at least two "loops" with special rules how to handle those keys.
Generally, to flatten/normalize a nested dict/json you can use the helper function json_normalize
from the pandas
library. For example if you only needed the top level keys from your json to be used as index, you could do the following:
import pandas as pd
dfs = []
for key in specs.keys():
df = pd.json_normalize(specs[key])
df.index = [key]
dfs.append(df)
pd.concat(dfs)
Which results in:
Upvotes: 1