Reputation: 2103
I have a dataframe with datetime as an index. Data are only from Dec, Jan and Feb. I tried to calculate mean value from dec,jan and feb. When I did like:
df.resample('a').mean()
then it gives me mean value from jan feb dec.
is there anyways to do that in pandas dataframe?
my data looks like:
2000-02-29 0.046871
2000-03-31 NaN
2000-04-30 NaN
2000-05-31 NaN
2000-06-30 NaN
2000-07-31 NaN
2000-08-31 NaN
2000-09-30 NaN
2000-10-31 NaN
2000-11-30 NaN
2000-12-31 0.015948
2001-01-31 0.020552
2001-02-28 0.033409
2001-03-31 NaN
2001-04-30 NaN
2001-05-31 NaN
2001-06-30 NaN
2001-07-31 NaN
2001-08-31 NaN
2001-09-30 NaN
2001-10-31 NaN
2001-11-30 NaN
2001-12-31 0.013204
2002-01-31 0.017093
2002-02-28 0.019723
2002-03-31 NaN
2002-04-30 NaN
Upvotes: 1
Views: 3074
Reputation: 863801
You need groupby
with strftime
:
df = df.groupby(df.index.strftime('%b')).mean()
print (df)
col
Dec 0.014576
Feb 0.033334
Jan 0.018822
If want also years:
df = df.groupby(df.index.strftime('%Y-%b')).mean()
print (df)
col
2000-Dec 0.015948
2000-Feb 0.046871
2001-Dec 0.013204
2001-Feb 0.033409
2001-Jan 0.020552
2002-Feb 0.019723
2002-Jan 0.017093
Another solution is convert to to_period
:
df = df.groupby(df.index.to_period('m')).mean()
print (df)
col
2000-02 0.046871
2000-12 0.015948
2001-01 0.020552
2001-02 0.033409
2001-12 0.013204
2002-01 0.017093
2002-02 0.019723
EDIT:
You need shift one month because December
and then groupby by year
:
year = df.shift(freq='m').index.year
print (year)
Int64Index([2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2001,
2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001,
2002, 2002, 2002, 2002, 2002],
dtype='int64')
df = df.groupby(year).mean()
print (df)
col
2000 0.046871
2001 0.023303
2002 0.016673
Upvotes: 4