Reputation: 17680
There are several questions like this, but the problem is that I have a dictionary of dictionary of ... and not a list of dictionary
I am importing a yaml file like
category1:
region1:
effect1:
up: 10
down: 20
effect2:
up: 30
down: 50
...
region2:
effect1:
up: 30
down: 40
...
region3:
...
category2:
region1:
...
when importing with yaml.load
I get one single big dictionary like
{'category':
{'region1':
{'effect1':
{'up': 10, 'down': 20},
...
I can I import this in pandas? I want to get a flat table with multiindex:
up down
category1 region1 effect1 10 20
category1 region1 effect2 30 50
...
category1 region2 effect1 30 40
...
I would like a general solution, since for example I have some data that are missing the region
layer.
Upvotes: 1
Views: 1257
Reputation: 30605
Use json_normalize
if you have a dict to generate a dataframe, then split and expand the columns i.e
di = {'category1': {'region1': {'effect1': {'up': 10, 'down': 20},'effect2': {'up': 12, 'down': 22}}},
'category2':{'region2': {'effect1': {'up': 10, 'down': 20},'effect2': {'up': 12, 'down': 22}}}}
k = pd.io.json.json_normalize(di)
#category1.region1.effect1.down category1.region1.effect1.up category1.region1.effect2.down category1.region1.effect2.up ...
#0 20 10 22 12 ...
k.columns = k.columns.str.split('.', expand=True)
# Transpose and unstack the dataframe for getting the actual dataframe
main_df = k.T.unstack()[0]
Output:
down up
category1 region1 effect1 20 10
effect2 22 12
category2 region2 effect1 20 10
effect2 22 12
Or an equivalent :
k = pd.io.json.json_normalize(di).T
k.index = k.index.str.split('.',expand=True)
mdf = k.unstack()[0]
Upvotes: 4