Ruggero Turra
Ruggero Turra

Reputation: 17680

import nested dictionary in pandas (from yaml)

There are several questions like this, but the problem is that I have a dictionary of dictionary of ... and not a list of dictionary

I am importing a yaml file like

category1:
  region1:
    effect1:
       up: 10
       down: 20
    effect2:
       up: 30
       down: 50
    ...
  region2:
    effect1:
     up: 30
     down: 40
  ...
  region3:
  ...
category2:
  region1:
  ...

when importing with yaml.load I get one single big dictionary like

{'category':
     {'region1':
          {'effect1':
               {'up': 10, 'down': 20},
           ...

I can I import this in pandas? I want to get a flat table with multiindex:

                                 up  down 
category1   region1    effect1   10    20
category1   region1    effect2   30    50
...
category1   region2    effect1   30    40
...

I would like a general solution, since for example I have some data that are missing the region layer.

Upvotes: 1

Views: 1257

Answers (1)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

Use json_normalize if you have a dict to generate a dataframe, then split and expand the columns i.e

di = {'category1': {'region1': {'effect1': {'up': 10, 'down': 20},'effect2': {'up': 12, 'down': 22}}},
  'category2':{'region2': {'effect1': {'up': 10, 'down': 20},'effect2': {'up': 12, 'down': 22}}}}

k = pd.io.json.json_normalize(di)

    #category1.region1.effect1.down  category1.region1.effect1.up  category1.region1.effect2.down  category1.region1.effect2.up ...
#0                              20                            10                               22                            12 ... 

k.columns = k.columns.str.split('.', expand=True)
# Transpose and unstack the dataframe for getting the actual dataframe 
main_df = k.T.unstack()[0]

Output:

                           down  up
category1 region1 effect1    20  10
                  effect2    22  12
category2 region2 effect1    20  10
                  effect2    22  12

Or an equivalent :

k = pd.io.json.json_normalize(di).T
k.index = k.index.str.split('.',expand=True)
mdf = k.unstack()[0]

Upvotes: 4

Related Questions