Reputation: 250
I have a data frame with monthly values and I want to obtain quaterly values as an average of 3 months group My data are like that (only first 9 months for the example)
month 01 02 03 04 05 06 07 08 09 \
year
2000 90.26 90.95 91.04 90.87 90.78 91.13 90.87 90.95 91.30
2000 87.89 89.68 90.10 90.27 90.53 90.87 89.93 91.30 91.98
2000 74.17 74.98 74.74 73.97 74.07 74.26 74.71 76.93 78.67
2000 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000 86.74 85.48 87.45 88.31 88.71 88.23 88.08 87.76 88.94
And I want obtain Q1 as the mean of months 01,02,03. i can do:
df['Q1']=(df['01']+df['02']+df['03'])/3
But I will have problems with the Nan.
Can I calculate means in groups of three months?
Upvotes: 1
Views: 103
Reputation: 129078
Using resample.
In [89]: x
Out[89]:
1 2 3 4 5 6 7 8 9
month
2000 90.26 90.95 91.04 90.87 90.78 91.13 90.87 90.95 91.30
2000 87.89 89.68 90.10 90.27 90.53 90.87 89.93 91.30 91.98
2000 74.17 74.98 74.74 73.97 74.07 74.26 74.71 76.93 78.67
2000 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000 86.74 85.48 87.45 88.31 88.71 88.23 88.08 87.76 88.94
In [90]: x.columns = pd.PeriodIndex([pd.Period(year=2000, month=m, freq='M')
for m in x.columns])
In [92]: x.index = ['%s_%s' % (y,i) for i, y in enumerate(x.index)]
In [93]: x
Out[93]:
2000-01 2000-02 2000-03 2000-04 2000-05 2000-06 2000-07 2000-08 2000-09
2000_0 90.26 90.95 91.04 90.87 90.78 91.13 90.87 90.95 91.30
2000_1 87.89 89.68 90.10 90.27 90.53 90.87 89.93 91.30 91.98
2000_2 74.17 74.98 74.74 73.97 74.07 74.26 74.71 76.93 78.67
2000_3 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000_4 86.74 85.48 87.45 88.31 88.71 88.23 88.08 87.76 88.94
In [94]: x.resample('Q', axis=1)
Out[94]:
2000Q1 2000Q2 2000Q3
2000_0 90.750000 90.926667 91.04
2000_1 89.223333 90.556667 91.07
2000_2 74.630000 74.100000 76.77
2000_3 NaN NaN NaN
2000_4 86.556667 88.416667 88.26
There is a bug in resampling on a duplicate index that's why I renamed the index here, it is fixed for 0.13 (but this solution was using 0.12).
This is ultimately the most flexible as you can now resample by different frequencies.
In [95]: x.resample('Q-JAN', axis=1)
Out[95]:
2000Q4 2001Q1 2001Q2 2001Q3
2000_0 90.26 90.953333 90.926667 91.125
2000_1 87.89 90.016667 90.443333 91.640
2000_2 74.17 74.563333 74.346667 77.800
2000_3 NaN NaN NaN NaN
2000_4 86.74 87.080000 88.340000 88.350
Upvotes: 1
Reputation: 881037
import pandas as pd
import io
content = io.BytesIO('''\
year 01 02 03 04 05 06 07 08 09
2000 90.26 90.95 91.04 90.87 90.78 91.13 90.87 90.95 91.30
2000 87.89 89.68 90.10 90.27 90.53 90.87 89.93 91.30 91.98
2000 74.17 74.98 74.74 73.97 74.07 74.26 74.71 76.93 78.67
2000 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000 86.74 85.48 87.45 88.31 88.71 88.23 88.08 87.76 88.94''')
df = pd.read_table(content, sep='\s+', index_col=0)
df.columns.name='month'
df2 = df.groupby(by=lambda x: (int(x)-1)//3, axis=1).mean()
df2.columns='Q1 Q2 Q3'.split()
print(df2)
Q1 Q2 Q3
year
2000 90.750000 90.926667 91.04
2000 89.223333 90.556667 91.07
2000 74.630000 74.100000 76.77
2000 NaN NaN NaN
2000 86.556667 88.416667 88.26
You can join these columns to the original dataframe with
df = pd.concat([df2, df], axis=1)
Upvotes: 0
Reputation: 375925
You can do this manually using loc and mean:
In [11]: df.loc[:, ['01', '02', '03']]
Out[11]:
01 02 03
year
2000 90.26 90.95 91.04
2000 87.89 89.68 90.10
2000 74.17 74.98 74.74
2000 NaN NaN NaN
2000 86.74 85.48 87.45
In [12]: df.loc[:, ['01', '02', '03']].mean(axis=1)
Out[12]:
year
2000 90.750000
2000 89.223333
2000 74.630000
2000 NaN
2000 86.556667
dtype: float64
But it might make more sense to use pandas' rolling_mean:
In [21]: pd.rolling_mean(df.T, 3)
Out[21]:
year 2000 2000 2000 2000 2000
month
01 NaN NaN NaN NaN NaN
02 NaN NaN NaN NaN NaN
03 90.750000 89.223333 74.630000 NaN 86.556667
04 90.953333 90.016667 74.563333 NaN 87.080000
05 90.896667 90.300000 74.260000 NaN 88.156667
06 90.926667 90.556667 74.100000 NaN 88.416667
07 90.926667 90.443333 74.346667 NaN 88.340000
08 90.983333 90.700000 75.300000 NaN 88.023333
09 91.040000 91.070000 76.770000 NaN 88.260000
By default this looks at 3 periods back, so we have to shift it two up:
In [22]: pd.rolling_mean(df.T, 3).shift(-2)
Out[22]:
year 2000 2000 2000 2000 2000
month
01 90.750000 89.223333 74.630000 NaN 86.556667
02 90.953333 90.016667 74.563333 NaN 87.080000
03 90.896667 90.300000 74.260000 NaN 88.156667
04 90.926667 90.556667 74.100000 NaN 88.416667
05 90.926667 90.443333 74.346667 NaN 88.340000
06 90.983333 90.700000 75.300000 NaN 88.023333
07 91.040000 91.070000 76.770000 NaN 88.260000
08 NaN NaN NaN NaN NaN
09 NaN NaN NaN NaN NaN
and transpose to get of in the correct form:
In [23]: pd.rolling_mean(df.T, 3).shift(-2).T
Out[23]:
month 01 02 03 04 05 06 07 08 09
year
2000 90.750000 90.953333 90.896667 90.926667 90.926667 90.983333 91.04 NaN NaN
2000 89.223333 90.016667 90.300000 90.556667 90.443333 90.700000 91.07 NaN NaN
2000 74.630000 74.563333 74.260000 74.100000 74.346667 75.300000 76.77 NaN NaN
2000 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000 86.556667 87.080000 88.156667 88.416667 88.340000 88.023333 88.26 NaN NaN
Upvotes: 2