Pandas DataFrame Table with Internal Stacked sub-table

Question

Lets say I have the following input data coming from a database.

    data = [{'abc': 'None',
          'cde': 4,
          'efg': 1,
          'blah': {'k': 23,
                   'y': 26,
                   'u':48}
          },{'abc': 'hdf',
          'cde':10,
          'efg': 2,
          'blah': {'k': 244,
                   'y': 747,
                   'u':75,
                   'g':88},
        },
]

The following is the look of the pandas data frame which I would like to produce:

                       blah
       cde  abc    g    k    u   y
efg 1   4  'None'  N/A  23   48  26
    2   10 'hdf'   88   244  75  747

key features:

'efg' is merged vertically and acts as the index (i think, please correct me if i am wrong)
Flat features such as 'cde', 'abc' are treated like normal columns
The nested dict has the parent stacked on top of its respective child elements, any nested field that does not have a value associated with the index, 'efg' gets N/A.

Attempts: When I take the data into a pandas data frame I get the following:

 abc  cde  efg  blah
k  None    4    1    23
u  None    4    1    48
y  None    4    1    26
g   hdf   10    2    88
k   hdf   10    2   244
u   hdf   10    2    75
y   hdf   10    2   747

I suspect pandas is taking the nested dict keys as an index, so i reset it which gives the following:

index   abc  cde  efg  blah
0     k  None    4    1    23
1     u  None    4    1    48
2     y  None    4    1    26
3     g   hdf   10    2    88
4     k   hdf   10    2   244
5     u   hdf   10    2    75
6     y   hdf   10    2   747

So I pivot this table by the pandas supplied 'index' column, i would expect it to have a stacked cross tab with the 'blah' parent as the top most item, with sub columns being its nested dict ('k','u', 'y', 'g'). In addition I expect the other columns to stay intact, with only the 'blah' section having merged cells. But, we get this instead:

>>> t_pivot = t_concat.pivot_table(columns='index')

>>> t_pivot

index     g      k     u      y
blah   88.0  133.5  61.5  386.5
cde    10.0    7.0   7.0    7.0
efg     2.0    1.5   1.5    1.5

Ok not great, So maybe i need to be explicit about what data we should index by:

t_pivot = t_concat.pivot_table(index = 'efg',columns=['index', 'cde'])

>>> t_pivot

       blah                                      
index     g     k            u           y       
cde      10    4      10    4     10    4      10
efg                                              
1       NaN  23.0    NaN  48.0   NaN  26.0    NaN
2      88.0   NaN  244.0   NaN  75.0   NaN  747.0

Still not right.

So any ideas on how to form a data frame from a dict with a nested dict? Primarily making cells merge properly for parents and their respective sub elements. The end result will go into an html document.

Other things I have tried: - turning the inner dict keys into tuples - using the stack() method, but cant seem to get anything looking right

Dennis Golomazov · Accepted Answer

df = pd.DataFrame(data).set_index('efg')
sub_df = df.blah.apply(pd.Series)
sub_df = pd.concat({'blah': sub_df}, axis=1, names=['l1', 'l2'])
df = pd.concat({'': df}, axis=1, names=['l1', 'l2'])
df = pd.concat((df, sub_df), axis=1)
df = df.drop(('', 'blah'), axis=1)
print df

l1             blah                    
l2    abc cde     g      k     u      y
efg                                    
1    None  4  NaN    23.0   48.0  26.0 
2    hdf   10  88.0  244.0  75.0  747.0

The hierarchical column solution is based on this answer.

Pandas DataFrame Table with Internal Stacked sub-table

Answers (1)

Related Questions