Reputation: 1142
I have a function that will calculate the means of values in column "A" of a pandas dataframe that fall on the same date. Is there an approach to do this that is more efficient than the list comprehension in the last line?
import pandas as pd
import numpy as np
def daily_mean(df,date,col):
return df[date][col].mean()
data = np.random.rand(100)
columns = ['A']
times = pd.date_range('20180101',freq='9H',periods=100)
df = pd.DataFrame(data=data,index=times,columns=columns)
dates = df.index.strftime('%Y%m%d').unique()
means = [daily_mean(df,date,'A') for date in dates]
Upvotes: 2
Views: 8953
Reputation: 1726
You can do:
df.reset_index().groupby(pd.Grouper(key='index', freq='D'))['A'].mean()
index
2018-01-01 0.412354
2018-01-02 0.301209
2018-01-03 0.710357
2018-01-04 0.158135
2018-01-05 0.720308
...
Upvotes: 3
Reputation: 402263
Yes, you can use the pd.Grouper
API. Also, this will retain the DataFrame, so it's a win-win.
means = df.groupby(pd.Grouper(freq='1D')).mean()
Or, you can group on the floor of the datetime index.
means = df.groupby(df.index.floor('D')).mean()
means.head(10)
A
2018-01-01 0.412354
2018-01-02 0.301209
2018-01-03 0.710357
2018-01-04 0.158135
2018-01-05 0.720308
2018-01-06 0.281862
2018-01-07 0.489486
2018-01-08 0.287808
2018-01-09 0.463117
2018-01-10 0.512963
I prefer approach #1 since pd.Grouper
should be more efficient than grouping on an unrelated series/array (as in the second form).
Upvotes: 9