Reputation: 739
I have namedtuples defined as follows:
In[37]: from collections import namedtuple
Point = namedtuple('Point', 'x y')
The nested dictionary has the following format:
In[38]: d
Out[38]:
{1: {None: {1: Point(x=1.0, y=5.0), 2: Point(x=4.0, y=8.0)}},
2: {None: {1: Point(x=45324.0, y=24338.0), 2: Point(x=45.0, y=38.0)}}}
I am trying to create a pandas dataframe from the dictionary d without having to do for loops.
I have succeeded in creating the dataframe from a subset of the dictionary by doing this:
In[40]: df=pd.DataFrame(d[1][None].values())
In[41]: df
Out[41]:
x y
0 1 5
1 4 8
But i want to be able to create the dataframe from the entire dictionary.
I want the dataframe to output the following (i am using multi index notation):
In[42]: df
Out[42]:
Subcase Step ID x y
1 None 1 1.0 5.0
2 4.0 8.0
2 None 1 45324.0 24338.0
2 45.0 38.0
The from_dict method of DataFrame, only supports up to two levels of nesting, so i was not able to use it. I am also considering modifying the structure of the d dictionary to achieve my goal. Furthermore, maybe it does not have to be a dictionary.
Thank you.
Upvotes: 5
Views: 1374
Reputation: 739
I decided to flatten the keys into a tuple (tested using pandas 0.18.1):
In [5]: from collections import namedtuple
In [6]: Point = namedtuple('Point', 'x y')
In [11]: from collections import OrderedDict
In [14]: d=OrderedDict()
In [15]: d[(1,None,1)]=Point(x=1.0, y=5.0)
In [16]: d[(1,None,2)]=Point(x=4.0, y=8.0)
In [17]: d[(2,None,1)]=Point(x=45324.0, y=24338.0)
In [18]: d[(2,None,2)]=Point(x=45.0, y=38.0)
Finally,
In [7]: import pandas as pd
In [8]: df=pd.DataFrame(d.values(), index=pd.MultiIndex.from_tuples(d.keys(), names=['Subcase','Step','ID']))
In [9]:df
Out[9]:
x y
Subcase Step ID
1 NaN 1 1.0 5.0
2 4.0 8.0
2 NaN 1 45324.0 24338.0
2 45.0 38.0
Upvotes: 0
Reputation: 5212
There are already several answers to similar questions on SO (here, here, or here). These solutions can be adapted to this problem as well. However, none of them is really general to be run on an arbitrary dict. So I decided to write something more universal.
This is a function that can be run on any dict. The dict has to have the same number of levels (depth) at any of its elements, otherwise it will most probably raise.
def frame_from_dict(dic, depth=None, **kwargs):
def get_dict_depth(dic):
if not isinstance(dic, dict):
return 0
for v in dic.values():
return get_dict_depth(v) + 1
if depth is None:
depth = get_dict_depth(dic)
if depth == 0:
return pd.Series(dic)
elif depth > 0:
keys = []
vals = []
for k, v in dic.items():
keys.append(k)
vals.append(frame_from_dict(v, depth - 1))
try:
keys = sorted(keys)
except TypeError:
# unorderable types
pass
return pd.concat(vals, axis=1, keys=keys, **kwargs)
raise ValueError("depth should be a nonnegative integer or None")
I sacrificed a namedtuple case from this question for the generality. But it can be tweaked if needed.
In this particular case, it can be applied as follows:
df = frame_from_dict(d, names=['Subcase', 'Step', 'ID']).T
df.columns = ['x', 'y']
df
Out[115]:
x y
Subcase Step ID
1 NaN 1 1.0 5.0
2 4.0 8.0
2 NaN 1 45324.0 24338.0
2 45.0 38.0
Upvotes: 2