Reputation: 171
I have an ordered list of pandas dataframes, each contains data for a given year:
yearly_data_list = [df_2000, df_2001, ... , df_2010]
These dataframes have the same shape, say R rows and C columns, and the same column names and index.
I want to create N = R*C time series of length len(yearly_data_list)
, one for each entry in the dataframes and put them into a nested dictionary where the final value is the list. For a given entry in the dataframes, the first value of the corresponding list would be the entry of df_2000
, the second value the entry df_2001
, etc.
For example, say I have N=3 dataframes with 2 rows (R=2) and 3 columns (C=3) each:
df_2000 = pd.DataFrame(data = [[12, 13, 14], [15, 16, 17]], columns=list('ABC'), index=list('xy'))
A B C
x 12 13 14
y 15 16 17
df_2001 = pd.DataFrame(data = [[6, 7, 8], [9, 10, 11]], columns=list('ABC'), index=list('xy'))
A B C
x 6 7 8
y 9 10 11
df_2002 = pd.DataFrame(data = [[0, 1, 2], [3, 4, 5]], columns=list('ABC'), index=list('xy'))
A B C
x 0 1 2
y 3 4 5
In this case, I would like to obtain R*C=6 lists (time series) of length 3 and put the inside a nested dictionary that looks something like:
{'x': {'A': [12, 6, 0], 'B': [13, 7, 1], 'C': [14, 8, 2]},
'y': {'A': [15, 9, 3], 'B': [16, 10, 4], 'C': [17, 11, 5]}}
Note that on the outermost level the keys come from the index and in the innermost level the keys come from the column names.
I know that with pd.Dataframe.T.to_dict()
I could get 3 dictionaries that look like this but contain the individual values for each dataframe instead of the lists. In this SO answer they merge multiple dictionaries but I have not been able to make it work for my nested dictionary case.
Any help using the link's approach, or any other, would be greatly appreciated. Thanks
Upvotes: 1
Views: 60
Reputation: 150735
This can be done with concat
and groupby
:
(pd.concat([df_2000, df_2001, df_2002])
.groupby(level=0).agg(list)
.to_dict('index')
)
Output:
{'x': {'A': [12, 6, 0], 'B': [13, 7, 1], 'C': [14, 8, 2]},
'y': {'A': [15, 9, 3], 'B': [16, 10, 4], 'C': [17, 11, 5]}}
Upvotes: 1