Pat
Pat

Reputation: 1339

Reshaping Pandas dataframe by months

The task is to transform the below table

import pandas as pd
import numpy as np

index = pd.date_range('2000-1-1', periods=700, freq='D')
df = pd.DataFrame(np.random.randn(700), index=index, columns=["values"])

df.groupby(by=[df.index.year, df.index.month]).sum()

In[1]: df

Out[1]: 
           values
2000 1   1.181000
     2  -8.005783
     3   6.590623
     4  -6.266232
     5   1.266315
     6   0.384050
     7  -1.418357
     8  -3.132253
     9   0.005496
     10 -6.646101
     11  9.616482
     12  3.960872
2001 1  -0.989869
     2  -2.845278
     3  -1.518746
     4   2.984735
     5  -2.616795
     6   8.360319
     7   5.659576
     8   0.279863
     9  -5.220678
     10  5.077400
     11  1.332519

such that it looks like this

      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec  
2000  1.2  -8.0 6.6  -6.3 1.2  0.4  -1.4 -3.1 0.0  -6.6 9.6  3.9    
2001  -0.9 -2.8 -1.5 3.0  -2.6 8.3  5.7  0.3  -5.2 5.1  1.3       

Additionally I need to add an extra column which sums the yearly values like this

      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec  Year
2000  1.2  -8.0 6.6  -6.3 1.2  0.4  -1.4 -3.1 0.0  -6.6 9.6  3.9  4.7
2001  -0.9 -2.8 -1.5 3.0  -2.6 8.3  5.7  0.3  -5.2 5.1  1.3       10.7

Is there a quick pandas pivotal way to solve this?

Upvotes: 5

Views: 1696

Answers (4)

piRSquared
piRSquared

Reputation: 294348

use strftime('%b') in your groupby

df['values'].groupby([df.index.year, df.index.strftime('%b')]).sum().unstack()

enter image description here


To preserve order of months

df['values'].groupby([df.index.year, df.index.strftime('%b')], sort=False).sum().unstack()

enter image description here


With 'Year' at end

df['values'].groupby([df.index.year, df.index.strftime('%b')], sort=False).sum() \
    .unstack().assign(Year=df.groupby(df.index.year).sum())

enter image description here

Upvotes: 7

Mohammad Yusuf
Mohammad Yusuf

Reputation: 17064

You can do something like this:

import pandas as pd
import numpy as np

index = pd.date_range('2000-1-1', periods=700, freq='D')
df = pd.DataFrame(np.random.randn(700), index=index, columns=["values"])

l = [df.index.strftime("%Y"), df.index.strftime("%b"), df.index.strftime("%d")]
df.index = l
df=df.groupby(level=[-3,-2]).sum().unstack(-1)
df['Year'] = df.sum(axis=1)
df

Output:

enter image description here

Upvotes: 3

Nickil Maveli
Nickil Maveli

Reputation: 29711

Only change is you need to unstack the DF to convert it into a wide format. Once you get the integer month numbers, you could convert these into a datetime by specifying %m directive as the format to be considered. After obtaining this, use it to retrieve it's string representation through the help of strftime.

Calculate the year by taking it's sum across columns by specifying axis=1.

np.random.seed(314)
fr = df.groupby([df.index.year, df.index.month]).sum().unstack(fill_value=0)
fr.columns = pd.to_datetime(fr.columns.droplevel(0), format='%m').strftime('%b')
fr['Year'] = fr.sum(1)

enter image description here

Upvotes: 2

Mathias711
Mathias711

Reputation: 6658

The extra Year column you can do by doing

df['Year'] = df.sum(axis=1)

It will sum the dataframe row-wise (due to the axis=1), and storing it in a new column.

Upvotes: 0

Related Questions