Reputation: 1339
The task is to transform the below table
import pandas as pd
import numpy as np
index = pd.date_range('2000-1-1', periods=700, freq='D')
df = pd.DataFrame(np.random.randn(700), index=index, columns=["values"])
df.groupby(by=[df.index.year, df.index.month]).sum()
In[1]: df
Out[1]:
values
2000 1 1.181000
2 -8.005783
3 6.590623
4 -6.266232
5 1.266315
6 0.384050
7 -1.418357
8 -3.132253
9 0.005496
10 -6.646101
11 9.616482
12 3.960872
2001 1 -0.989869
2 -2.845278
3 -1.518746
4 2.984735
5 -2.616795
6 8.360319
7 5.659576
8 0.279863
9 -5.220678
10 5.077400
11 1.332519
such that it looks like this
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2000 1.2 -8.0 6.6 -6.3 1.2 0.4 -1.4 -3.1 0.0 -6.6 9.6 3.9
2001 -0.9 -2.8 -1.5 3.0 -2.6 8.3 5.7 0.3 -5.2 5.1 1.3
Additionally I need to add an extra column which sums the yearly values like this
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Year
2000 1.2 -8.0 6.6 -6.3 1.2 0.4 -1.4 -3.1 0.0 -6.6 9.6 3.9 4.7
2001 -0.9 -2.8 -1.5 3.0 -2.6 8.3 5.7 0.3 -5.2 5.1 1.3 10.7
Is there a quick pandas pivotal way to solve this?
Upvotes: 5
Views: 1696
Reputation: 294348
use strftime('%b')
in your groupby
df['values'].groupby([df.index.year, df.index.strftime('%b')]).sum().unstack()
To preserve order of months
df['values'].groupby([df.index.year, df.index.strftime('%b')], sort=False).sum().unstack()
With 'Year'
at end
df['values'].groupby([df.index.year, df.index.strftime('%b')], sort=False).sum() \
.unstack().assign(Year=df.groupby(df.index.year).sum())
Upvotes: 7
Reputation: 17064
You can do something like this:
import pandas as pd
import numpy as np
index = pd.date_range('2000-1-1', periods=700, freq='D')
df = pd.DataFrame(np.random.randn(700), index=index, columns=["values"])
l = [df.index.strftime("%Y"), df.index.strftime("%b"), df.index.strftime("%d")]
df.index = l
df=df.groupby(level=[-3,-2]).sum().unstack(-1)
df['Year'] = df.sum(axis=1)
df
Output:
Upvotes: 3
Reputation: 29711
Only change is you need to unstack
the DF
to convert it into a wide format. Once you get the integer month numbers, you could convert these into a datetime
by specifying %m
directive as the format to be considered. After obtaining this, use it to retrieve it's string representation through the help of strftime
.
Calculate the year by taking it's sum across columns by specifying axis=1
.
np.random.seed(314)
fr = df.groupby([df.index.year, df.index.month]).sum().unstack(fill_value=0)
fr.columns = pd.to_datetime(fr.columns.droplevel(0), format='%m').strftime('%b')
fr['Year'] = fr.sum(1)
Upvotes: 2
Reputation: 6658
The extra Year
column you can do by doing
df['Year'] = df.sum(axis=1)
It will sum the dataframe row-wise (due to the axis=1
), and storing it in a new column.
Upvotes: 0