Reputation: 1339

Reshaping Pandas dataframe by months

The task is to transform the below table

import pandas as pd
import numpy as np

index = pd.date_range('2000-1-1', periods=700, freq='D')
df = pd.DataFrame(np.random.randn(700), index=index, columns=["values"])

df.groupby(by=[df.index.year, df.index.month]).sum()

In[1]: df

Out[1]: 
           values
2000 1   1.181000
     2  -8.005783
     3   6.590623
     4  -6.266232
     5   1.266315
     6   0.384050
     7  -1.418357
     8  -3.132253
     9   0.005496
     10 -6.646101
     11  9.616482
     12  3.960872
2001 1  -0.989869
     2  -2.845278
     3  -1.518746
     4   2.984735
     5  -2.616795
     6   8.360319
     7   5.659576
     8   0.279863
     9  -5.220678
     10  5.077400
     11  1.332519

such that it looks like this

      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec  
2000  1.2  -8.0 6.6  -6.3 1.2  0.4  -1.4 -3.1 0.0  -6.6 9.6  3.9    
2001  -0.9 -2.8 -1.5 3.0  -2.6 8.3  5.7  0.3  -5.2 5.1  1.3

Additionally I need to add an extra column which sums the yearly values like this

      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec  Year
2000  1.2  -8.0 6.6  -6.3 1.2  0.4  -1.4 -3.1 0.0  -6.6 9.6  3.9  4.7
2001  -0.9 -2.8 -1.5 3.0  -2.6 8.3  5.7  0.3  -5.2 5.1  1.3       10.7

Is there a quick pandas pivotal way to solve this?

Upvotes: 5

Answers (4)

piRSquared

Reputation: 294348

use strftime('%b') in your groupby

df['values'].groupby([df.index.year, df.index.strftime('%b')]).sum().unstack()

To preserve order of months

df['values'].groupby([df.index.year, df.index.strftime('%b')], sort=False).sum().unstack()

With 'Year' at end

df['values'].groupby([df.index.year, df.index.strftime('%b')], sort=False).sum() \
    .unstack().assign(Year=df.groupby(df.index.year).sum())

Upvotes: 7

Mohammad Yusuf

Reputation: 17064

You can do something like this:

import pandas as pd
import numpy as np

index = pd.date_range('2000-1-1', periods=700, freq='D')
df = pd.DataFrame(np.random.randn(700), index=index, columns=["values"])

l = [df.index.strftime("%Y"), df.index.strftime("%b"), df.index.strftime("%d")]
df.index = l
df=df.groupby(level=[-3,-2]).sum().unstack(-1)
df['Year'] = df.sum(axis=1)
df

Output:

Upvotes: 3

Nickil Maveli

Reputation: 29711

Only change is you need to unstack the DF to convert it into a wide format. Once you get the integer month numbers, you could convert these into a datetime by specifying %m directive as the format to be considered. After obtaining this, use it to retrieve it's string representation through the help of strftime.

Calculate the year by taking it's sum across columns by specifying axis=1.

np.random.seed(314)
fr = df.groupby([df.index.year, df.index.month]).sum().unstack(fill_value=0)
fr.columns = pd.to_datetime(fr.columns.droplevel(0), format='%m').strftime('%b')
fr['Year'] = fr.sum(1)

Upvotes: 2

Mathias711

Reputation: 6658

The extra Year column you can do by doing

df['Year'] = df.sum(axis=1)

It will sum the dataframe row-wise (due to the axis=1), and storing it in a new column.

Upvotes: 0

Reshaping Pandas dataframe by months

Answers (4)

Related Questions