Abhishek Kulkarni
Abhishek Kulkarni

Reputation: 676

How to transform pandas dataframe

I have a pandas dataframe as follows:-

import pandas as pd
import numpy as np
from datetime import datetime
start = datetime(2011, 1, 1)
end = datetime(2012, 1, 1)

index = pd.date_range(start, end)

Cols = ['Returns']



df = pd.DataFrame(abs(np.random.randn(366,1)), index=index, columns=Cols)

I need to transform it in such a way that the index is year and columns are months. The expected output is as follows:-

start1 = 2011
end1 = 2012

index1 = (start, end)
cols2=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
df_out = pd.DataFrame(abs(np.random.randn(2,12)), index=index1, columns=cols2)

The single value per year can be sum or average. I tried dataframe groupby as follows:-

DFList = []
for group in df.groupby(df.index.month):
    DFList.append(group[1])


r2 = pd.concat([DFList[0], DFList[1] ,DFList[2], DFList[3], DFList[4], 
DFList[5],DFList[6],DFList[7],DFList[8], DFList[9], 
DFList[10],DFList[11]],ignore_index=True,axis=1)
cols2=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
r2.columns=cols2

I am confused at this point and unable to proceed further. Thanking you in anticipation. Please suggest a way forward.

Upvotes: 2

Views: 155

Answers (1)

jezrael
jezrael

Reputation: 862406

Use pivot_table with specify aggregate function with DatetimeIndex.year and DatetimeIndex.month:

df = df.pivot_table(index=df.index.year, 
                    columns=df.index.month,
                    values='Returns', 
                    aggfunc='sum')

print (df)
             1         2          3          4          5          6   \
2011  26.049121  20.05826  29.157931  25.513904  19.148302  23.065742   
2012   0.023056       NaN        NaN        NaN        NaN        NaN   

             7          8          9          10         11        12  
2011  23.049623  20.075674  23.715332  28.650968  27.337803  24.93568  
2012        NaN        NaN        NaN        NaN        NaN       NaN

If need month names in correct order one solution with ordered CategoricalIndex and DatetimeIndex.strftime:

cols2 = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
df = df.pivot_table(index=df.index.year, 
                    columns=pd.CategoricalIndex(df.index.strftime('%b'), 
                                                ordered=True, 
                                                categories=cols2),
                    values='Returns', 
                    aggfunc='sum')

Another solution is use DataFrame.reindex:

cols2=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
df = (df.pivot_table(index=df.index.year, 
                    columns=df.index.strftime('%b'),
                    values='Returns', 
                    aggfunc='sum').reindex(columns=cols2))


print (df)
            Jan       Feb        Mar        Apr        May        Jun  \
2011  26.049121  20.05826  29.157931  25.513904  19.148302  23.065742   
2012   0.023056       NaN        NaN        NaN        NaN        NaN   

            Jul        Aug        Sep        Oct        Nov       Dec  
2011  23.049623  20.075674  23.715332  28.650968  27.337803  24.93568  
2012        NaN        NaN        NaN        NaN        NaN       NaN  

Upvotes: 2

Related Questions