Faster way to groupby time of day in pandas

Question

I have a time series of several days of 1-minute data, and would like to average it across all days by time of day.

This is very slow:

from datetime import datetime
from pandas import date_range, Series
time_ind = date_range(datetime(2013, 1, 1), datetime(2013, 1, 10), freq='1min')
all_data = Series(randn(len(time_ind)), time_ind)
time_mean = all_data.groupby(lambda x: x.time()).mean()

Takes almost a minute to run!

While something like:

time_mean = all_data.groupby(lambda x: x.minute).mean()

takes only a fraction of a second.

Is there a faster way to group by time of day?

Any idea why this is so slow?

bmu · Accepted Answer

Both your "lambda-version" and the time property introduced in version 0.11 seems to be slow in version 0.11.0:

In [4]: %timeit all_data.groupby(all_data.index.time).mean()
1 loops, best of 3: 11.8 s per loop

In [5]: %timeit all_data.groupby(lambda x: x.time()).mean()
Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in  ignored
Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in  ignored
Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in  ignored
1 loops, best of 3: 11.8 s per loop

With the current master both methods are considerably faster:

In [1]: pd.version.version
Out[1]: '0.11.1.dev-06cd915'

In [5]: %timeit all_data.groupby(lambda x: x.time()).mean()
1 loops, best of 3: 215 ms per loop

In [6]: %timeit all_data.groupby(all_data.index.time).mean()
10 loops, best of 3: 113 ms per loop
'0.11.1.dev-06cd915'

So you can either update to a master or wait for 0.11.1 which should be released this month.

Faster way to groupby time of day in pandas

Answers (2)

Related Questions