stephenb
stephenb

Reputation: 1142

Compute daily mean with pandas

I have a function that will calculate the means of values in column "A" of a pandas dataframe that fall on the same date. Is there an approach to do this that is more efficient than the list comprehension in the last line?

import pandas as pd
import numpy as np

def daily_mean(df,date,col):
     return df[date][col].mean()

data = np.random.rand(100)
columns = ['A']
times = pd.date_range('20180101',freq='9H',periods=100)
df = pd.DataFrame(data=data,index=times,columns=columns)

dates = df.index.strftime('%Y%m%d').unique()
means = [daily_mean(df,date,'A') for date in dates]

Upvotes: 2

Views: 8953

Answers (2)

Vishnu Kunchur
Vishnu Kunchur

Reputation: 1726

You can do:

df.reset_index().groupby(pd.Grouper(key='index', freq='D'))['A'].mean()

index
2018-01-01    0.412354
2018-01-02    0.301209
2018-01-03    0.710357
2018-01-04    0.158135
2018-01-05    0.720308
...

Upvotes: 3

cs95
cs95

Reputation: 402263

Yes, you can use the pd.Grouper API. Also, this will retain the DataFrame, so it's a win-win.

means = df.groupby(pd.Grouper(freq='1D')).mean()

Or, you can group on the floor of the datetime index.

means = df.groupby(df.index.floor('D')).mean()

means.head(10)
                   A
2018-01-01  0.412354
2018-01-02  0.301209
2018-01-03  0.710357
2018-01-04  0.158135
2018-01-05  0.720308
2018-01-06  0.281862
2018-01-07  0.489486
2018-01-08  0.287808
2018-01-09  0.463117
2018-01-10  0.512963

I prefer approach #1 since pd.Grouper should be more efficient than grouping on an unrelated series/array (as in the second form).

Upvotes: 9

Related Questions