snowleopard
snowleopard

Reputation: 739

Nested dictionary of namedtuples to pandas dataframe

I have namedtuples defined as follows:

In[37]: from collections import namedtuple
        Point = namedtuple('Point', 'x y')

The nested dictionary has the following format:

In[38]: d
Out[38]: 
{1: {None: {1: Point(x=1.0, y=5.0), 2: Point(x=4.0, y=8.0)}},
2: {None: {1: Point(x=45324.0, y=24338.0), 2: Point(x=45.0, y=38.0)}}}

I am trying to create a pandas dataframe from the dictionary d without having to do for loops.

I have succeeded in creating the dataframe from a subset of the dictionary by doing this:

In[40]: df=pd.DataFrame(d[1][None].values())

In[41]: df

Out[41]: 
   x  y
0  1  5
1  4  8

But i want to be able to create the dataframe from the entire dictionary.

I want the dataframe to output the following (i am using multi index notation):

In[42]: df
Out[42]:
Subcase Step ID  x       y
1       None 1   1.0     5.0
             2   4.0     8.0
2       None 1   45324.0 24338.0
             2   45.0    38.0

The from_dict method of DataFrame, only supports up to two levels of nesting, so i was not able to use it. I am also considering modifying the structure of the d dictionary to achieve my goal. Furthermore, maybe it does not have to be a dictionary.

Thank you.

Upvotes: 5

Views: 1374

Answers (2)

snowleopard
snowleopard

Reputation: 739

I decided to flatten the keys into a tuple (tested using pandas 0.18.1):

In [5]: from collections import namedtuple

In [6]: Point = namedtuple('Point', 'x y')

In [11]: from collections import OrderedDict

In [14]: d=OrderedDict()

In [15]: d[(1,None,1)]=Point(x=1.0, y=5.0)

In [16]: d[(1,None,2)]=Point(x=4.0, y=8.0)

In [17]: d[(2,None,1)]=Point(x=45324.0, y=24338.0)

In [18]: d[(2,None,2)]=Point(x=45.0, y=38.0)

Finally,

In [7]: import pandas as pd

In [8]: df=pd.DataFrame(d.values(),  index=pd.MultiIndex.from_tuples(d.keys(), names=['Subcase','Step','ID']))


In [9]:df
Out[9]: 
                       x        y
Subcase Step ID                  
1       NaN  1       1.0      5.0
             2       4.0      8.0
2       NaN  1   45324.0  24338.0
             2      45.0     38.0

Upvotes: 0

ptrj
ptrj

Reputation: 5212

There are already several answers to similar questions on SO (here, here, or here). These solutions can be adapted to this problem as well. However, none of them is really general to be run on an arbitrary dict. So I decided to write something more universal.

This is a function that can be run on any dict. The dict has to have the same number of levels (depth) at any of its elements, otherwise it will most probably raise.

def frame_from_dict(dic, depth=None, **kwargs):
    def get_dict_depth(dic):
        if not isinstance(dic, dict):
            return 0
        for v in dic.values():
            return get_dict_depth(v) + 1

    if depth is None:
        depth = get_dict_depth(dic)

    if depth == 0:
        return pd.Series(dic)
    elif depth > 0:
        keys = []
        vals = []
        for k, v in dic.items():
            keys.append(k)
            vals.append(frame_from_dict(v, depth - 1))
        try:
            keys = sorted(keys)
        except TypeError:
            # unorderable types
            pass
        return pd.concat(vals, axis=1, keys=keys, **kwargs)

    raise ValueError("depth should be a nonnegative integer or None")

I sacrificed a namedtuple case from this question for the generality. But it can be tweaked if needed.

In this particular case, it can be applied as follows:

df = frame_from_dict(d, names=['Subcase', 'Step', 'ID']).T
df.columns = ['x', 'y']
df
Out[115]: 
                       x        y
Subcase Step ID                  
1       NaN  1       1.0      5.0
             2       4.0      8.0
2       NaN  1   45324.0  24338.0
             2      45.0     38.0

Upvotes: 2

Related Questions