Reputation: 676
I have a pandas dataframe as follows:-
import pandas as pd
import numpy as np
from datetime import datetime
start = datetime(2011, 1, 1)
end = datetime(2012, 1, 1)
index = pd.date_range(start, end)
Cols = ['Returns']
df = pd.DataFrame(abs(np.random.randn(366,1)), index=index, columns=Cols)
I need to transform it in such a way that the index is year and columns are months. The expected output is as follows:-
start1 = 2011
end1 = 2012
index1 = (start, end)
cols2=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
df_out = pd.DataFrame(abs(np.random.randn(2,12)), index=index1, columns=cols2)
The single value per year can be sum or average. I tried dataframe groupby as follows:-
DFList = []
for group in df.groupby(df.index.month):
DFList.append(group[1])
r2 = pd.concat([DFList[0], DFList[1] ,DFList[2], DFList[3], DFList[4],
DFList[5],DFList[6],DFList[7],DFList[8], DFList[9],
DFList[10],DFList[11]],ignore_index=True,axis=1)
cols2=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
r2.columns=cols2
I am confused at this point and unable to proceed further. Thanking you in anticipation. Please suggest a way forward.
Upvotes: 2
Views: 155
Reputation: 862406
Use pivot_table
with specify aggregate function with DatetimeIndex.year
and
DatetimeIndex.month
:
df = df.pivot_table(index=df.index.year,
columns=df.index.month,
values='Returns',
aggfunc='sum')
print (df)
1 2 3 4 5 6 \
2011 26.049121 20.05826 29.157931 25.513904 19.148302 23.065742
2012 0.023056 NaN NaN NaN NaN NaN
7 8 9 10 11 12
2011 23.049623 20.075674 23.715332 28.650968 27.337803 24.93568
2012 NaN NaN NaN NaN NaN NaN
If need month
names in correct order one solution with ordered CategoricalIndex
and DatetimeIndex.strftime
:
cols2 = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
df = df.pivot_table(index=df.index.year,
columns=pd.CategoricalIndex(df.index.strftime('%b'),
ordered=True,
categories=cols2),
values='Returns',
aggfunc='sum')
Another solution is use DataFrame.reindex
:
cols2=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
df = (df.pivot_table(index=df.index.year,
columns=df.index.strftime('%b'),
values='Returns',
aggfunc='sum').reindex(columns=cols2))
print (df)
Jan Feb Mar Apr May Jun \
2011 26.049121 20.05826 29.157931 25.513904 19.148302 23.065742
2012 0.023056 NaN NaN NaN NaN NaN
Jul Aug Sep Oct Nov Dec
2011 23.049623 20.075674 23.715332 28.650968 27.337803 24.93568
2012 NaN NaN NaN NaN NaN NaN
Upvotes: 2