Ouroboroski
Ouroboroski

Reputation: 171

Transform list of Pandas DataFrames to nested dictionary with values as lists

I have an ordered list of pandas dataframes, each contains data for a given year:

yearly_data_list = [df_2000, df_2001, ... , df_2010]

These dataframes have the same shape, say R rows and C columns, and the same column names and index.

I want to create N = R*C time series of length len(yearly_data_list), one for each entry in the dataframes and put them into a nested dictionary where the final value is the list. For a given entry in the dataframes, the first value of the corresponding list would be the entry of df_2000, the second value the entry df_2001, etc.

For example, say I have N=3 dataframes with 2 rows (R=2) and 3 columns (C=3) each:

df_2000 = pd.DataFrame(data = [[12, 13, 14], [15, 16, 17]], columns=list('ABC'), index=list('xy'))

   A   B   C
x  12  13  14
y  15  16  17

df_2001 = pd.DataFrame(data = [[6, 7, 8], [9, 10, 11]], columns=list('ABC'), index=list('xy'))

   A   B   C
x  6   7   8
y  9  10  11

df_2002 = pd.DataFrame(data = [[0, 1, 2], [3, 4, 5]], columns=list('ABC'), index=list('xy'))

   A  B  C
x  0  1  2
y  3  4  5

In this case, I would like to obtain R*C=6 lists (time series) of length 3 and put the inside a nested dictionary that looks something like:

{'x': {'A': [12, 6, 0], 'B': [13, 7, 1], 'C': [14, 8, 2]},
 'y': {'A': [15, 9, 3], 'B': [16, 10, 4], 'C': [17, 11, 5]}}

Note that on the outermost level the keys come from the index and in the innermost level the keys come from the column names.

I know that with pd.Dataframe.T.to_dict() I could get 3 dictionaries that look like this but contain the individual values for each dataframe instead of the lists. In this SO answer they merge multiple dictionaries but I have not been able to make it work for my nested dictionary case.

Any help using the link's approach, or any other, would be greatly appreciated. Thanks

Upvotes: 1

Views: 60

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

This can be done with concat and groupby:

(pd.concat([df_2000, df_2001, df_2002])
   .groupby(level=0).agg(list)
   .to_dict('index')
 )

Output:

{'x': {'A': [12, 6, 0], 'B': [13, 7, 1], 'C': [14, 8, 2]},
 'y': {'A': [15, 9, 3], 'B': [16, 10, 4], 'C': [17, 11, 5]}}

Upvotes: 1

Related Questions