aramnhammer
aramnhammer

Reputation: 145

Pandas DataFrame Table with Internal Stacked sub-table

Lets say I have the following input data coming from a database.

    data = [{'abc': 'None',
          'cde': 4,
          'efg': 1,
          'blah': {'k': 23,
                   'y': 26,
                   'u':48}
          },{'abc': 'hdf',
          'cde':10,
          'efg': 2,
          'blah': {'k': 244,
                   'y': 747,
                   'u':75,
                   'g':88},
        },
]

The following is the look of the pandas data frame which I would like to produce:

                       blah
       cde  abc    g    k    u   y
efg 1   4  'None'  N/A  23   48  26
    2   10 'hdf'   88   244  75  747

key features:

Attempts: When I take the data into a pandas data frame I get the following:

 abc  cde  efg  blah
k  None    4    1    23
u  None    4    1    48
y  None    4    1    26
g   hdf   10    2    88
k   hdf   10    2   244
u   hdf   10    2    75
y   hdf   10    2   747     

I suspect pandas is taking the nested dict keys as an index, so i reset it which gives the following:

index   abc  cde  efg  blah
0     k  None    4    1    23
1     u  None    4    1    48
2     y  None    4    1    26
3     g   hdf   10    2    88
4     k   hdf   10    2   244
5     u   hdf   10    2    75
6     y   hdf   10    2   747

So I pivot this table by the pandas supplied 'index' column, i would expect it to have a stacked cross tab with the 'blah' parent as the top most item, with sub columns being its nested dict ('k','u', 'y', 'g'). In addition I expect the other columns to stay intact, with only the 'blah' section having merged cells. But, we get this instead:

>>> t_pivot = t_concat.pivot_table(columns='index')

>>> t_pivot

index     g      k     u      y
blah   88.0  133.5  61.5  386.5
cde    10.0    7.0   7.0    7.0
efg     2.0    1.5   1.5    1.5

Ok not great, So maybe i need to be explicit about what data we should index by:

t_pivot = t_concat.pivot_table(index = 'efg',columns=['index', 'cde'])

>>> t_pivot

       blah                                      
index     g     k            u           y       
cde      10    4      10    4     10    4      10
efg                                              
1       NaN  23.0    NaN  48.0   NaN  26.0    NaN
2      88.0   NaN  244.0   NaN  75.0   NaN  747.0

Still not right.

So any ideas on how to form a data frame from a dict with a nested dict? Primarily making cells merge properly for parents and their respective sub elements. The end result will go into an html document.

Other things I have tried: - turning the inner dict keys into tuples - using the stack() method, but cant seem to get anything looking right

Upvotes: 3

Views: 4614

Answers (1)

Dennis Golomazov
Dennis Golomazov

Reputation: 17349

df = pd.DataFrame(data).set_index('efg')
sub_df = df.blah.apply(pd.Series)
sub_df = pd.concat({'blah': sub_df}, axis=1, names=['l1', 'l2'])
df = pd.concat({'': df}, axis=1, names=['l1', 'l2'])
df = pd.concat((df, sub_df), axis=1)
df = df.drop(('', 'blah'), axis=1)
print df

l1             blah                    
l2    abc cde     g      k     u      y
efg                                    
1    None  4  NaN    23.0   48.0  26.0 
2    hdf   10  88.0  244.0  75.0  747.0

The hierarchical column solution is based on this answer.

Upvotes: 1

Related Questions