Reputation: 16987
Say I have a dictionary that looks like this:
dictionary = {'A' : {'a': [1,2,3,4,5],
'b': [6,7,8,9,1]},
'B' : {'a': [2,3,4,5,6],
'b': [7,8,9,1,2]}}
and I want a dataframe that looks something like this:
A B
a b a b
0 1 6 2 7
1 2 7 3 8
2 3 8 4 9
3 4 9 5 1
4 5 1 6 2
Is there a convenient way to do this? If I try:
In [99]:
DataFrame(dictionary)
Out[99]:
A B
a [1, 2, 3, 4, 5] [2, 3, 4, 5, 6]
b [6, 7, 8, 9, 1] [7, 8, 9, 1, 2]
I get a dataframe where each element is a list. What I need is a multiindex where each level corresponds to the keys in the nested dict and the rows corresponding to each element in the list as shown above. I think I can work a very crude solution but I'm hoping there might be something a bit simpler.
Upvotes: 106
Views: 83091
Reputation: 572
You're looking for the functionality in .stack
:
df = pandas.DataFrame.from_dict(dictionary, orient="index").stack().to_frame()
# to break out the lists into columns
df = pandas.DataFrame(df[0].values.tolist(), index=df.index)
Upvotes: 39
Reputation: 251355
Pandas wants the MultiIndex values as tuples, not nested dicts. The simplest thing is to convert your dictionary to the right format before trying to pass it to DataFrame:
>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
{('A', 'a'): [1, 2, 3, 4, 5],
('A', 'b'): [6, 7, 8, 9, 1],
('B', 'a'): [2, 3, 4, 5, 6],
('B', 'b'): [7, 8, 9, 1, 2]}
>>> pandas.DataFrame(reform)
A B
a b a b
0 1 6 2 7
1 2 7 3 8
2 3 8 4 9
3 4 9 5 1
4 5 1 6 2
[5 rows x 4 columns]
Upvotes: 114
Reputation: 303
This solution works for a larger dataframe, it fits what was requested
cols = df.columns
int_cols = len(cols)
col_subset_1 = [cols[x] for x in range(1,int(int_cols/2)+1)]
col_subset_2 = [cols[x] for x in range(int(int_cols/2)+1, int_cols)]
col_subset_1_label = list(zip(['A']*len(col_subset_1), col_subset_1))
col_subset_2_label = list(zip(['B']*len(col_subset_2), col_subset_2))
df.columns = pd.MultiIndex.from_tuples([('','myIndex'),*col_subset_1_label,*col_subset_2_label])
OUTPUT
A B
myIndex a b c d
0 0.159710 1.472925 0.619508 -0.476738 0.866238
1 -0.665062 0.609273 -0.089719 0.730012 0.751615
2 0.215350 -0.403239 1.801829 -2.052797 -1.026114
3 -0.609692 1.163072 -1.007984 -0.324902 -1.624007
4 0.791321 -0.060026 -1.328531 -0.498092 0.559837
5 0.247412 -0.841714 0.354314 0.506985 0.425254
6 0.443535 1.037502 -0.433115 0.601754 -1.405284
7 -0.433744 1.514892 1.963495 -2.353169 1.285580
Upvotes: 0
Reputation: 39
If lists in the dictionary are not of the same lenght, you can adapte the method of BrenBarn.
>>> dictionary = {'A' : {'a': [1,2,3,4,5],
'b': [6,7,8,9,1]},
'B' : {'a': [2,3,4,5,6],
'b': [7,8,9,1]}}
>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
{('A', 'a'): [1, 2, 3, 4, 5],
('A', 'b'): [6, 7, 8, 9, 1],
('B', 'a'): [2, 3, 4, 5, 6],
('B', 'b'): [7, 8, 9, 1]}
>>> pandas.DataFrame.from_dict(reform, orient='index').transpose()
>>> df.columns = pd.MultiIndex.from_tuples(df.columns)
A B
a b a b
0 1 6 2 7
1 2 7 3 8
2 3 8 4 9
3 4 9 5 1
4 5 1 6 NaN
[5 rows x 4 columns]
Upvotes: 3
Reputation: 29
This recursive function should work:
def reform_dict(dictionary, t=tuple(), reform={}):
for key, val in dictionary.items():
t = t + (key,)
if isinstance(val, dict):
reform_dict(val, t, reform)
else:
reform.update({t: val})
t = t[:-1]
return reform
Upvotes: 2
Reputation: 371
dict_of_df = {k: pd.DataFrame(v) for k,v in dictionary.items()}
df = pd.concat(dict_of_df, axis=1)
Note that the order of columns is lost for python < 3.6
Upvotes: 27