bikuser
bikuser

Reputation: 2103

calculate average value from pandas dataframe

I have a dataframe with datetime as an index. Data are only from Dec, Jan and Feb. I tried to calculate mean value from dec,jan and feb. When I did like:

df.resample('a').mean()

then it gives me mean value from jan feb dec.

is there anyways to do that in pandas dataframe?

my data looks like:

2000-02-29    0.046871
2000-03-31         NaN
2000-04-30         NaN
2000-05-31         NaN
2000-06-30         NaN
2000-07-31         NaN
2000-08-31         NaN
2000-09-30         NaN
2000-10-31         NaN
2000-11-30         NaN
2000-12-31    0.015948
2001-01-31    0.020552
2001-02-28    0.033409
2001-03-31         NaN
2001-04-30         NaN
2001-05-31         NaN
2001-06-30         NaN
2001-07-31         NaN
2001-08-31         NaN
2001-09-30         NaN
2001-10-31         NaN
2001-11-30         NaN
2001-12-31    0.013204
2002-01-31    0.017093
2002-02-28    0.019723
2002-03-31         NaN
2002-04-30         NaN

Upvotes: 1

Views: 3074

Answers (1)

jezrael
jezrael

Reputation: 863801

You need groupby with strftime:

df = df.groupby(df.index.strftime('%b')).mean()
print (df)
          col
Dec  0.014576
Feb  0.033334
Jan  0.018822

If want also years:

df = df.groupby(df.index.strftime('%Y-%b')).mean()
print (df)
               col
2000-Dec  0.015948
2000-Feb  0.046871
2001-Dec  0.013204
2001-Feb  0.033409
2001-Jan  0.020552
2002-Feb  0.019723
2002-Jan  0.017093

Another solution is convert to to_period:

df = df.groupby(df.index.to_period('m')).mean()
print (df)
              col
2000-02  0.046871
2000-12  0.015948
2001-01  0.020552
2001-02  0.033409
2001-12  0.013204
2002-01  0.017093
2002-02  0.019723

EDIT:

You need shift one month because December and then groupby by year:

year = df.shift(freq='m').index.year
print (year)
Int64Index([2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2001,
            2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001,
            2002, 2002, 2002, 2002, 2002],
           dtype='int64')


df = df.groupby(year).mean()
print (df)
           col
2000  0.046871
2001  0.023303
2002  0.016673

Upvotes: 4

Related Questions