split-apply-combine on pandas timedelta column

Question

I have a DataFrame with a column of timedeltas (actually upon inspection the dtype is timedelta64[ns] or ), and I'd like to do a split-combine-apply, but the timedelta column is being dropped:



import pandas as pd

import numpy as np

pd.__version__
Out[3]: '0.13.0rc1'

np.__version__
Out[4]: '1.8.0'

data = pd.DataFrame(np.random.rand(10, 3), columns=['f1', 'f2', 'td'])

data['td'] *= 10000000

data['td'] = pd.Series(data['td'], dtype='


Or, forcing pandas to try the operation on the 'td' column:

data.groupby(data.index < 5)['td'].mean()
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
 in ()
----> 1 data.groupby(data.index < 5)['td'].mean()

/path/to/lib/python3.3/site-packages/pandas-0.13.0rc1-py3.3-linux-x86_64.egg/pandas/core/groupby.py in mean(self)
    417         """
    418         try:
--> 419             return self._cython_agg_general('mean')
    420         except GroupByError:
    421             raise

/path/to/lib/python3.3/site-packages/pandas-0.13.0rc1-py3.3-linux-x86_64.egg/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)
    669 
    670         if len(output) == 0:
--> 671             raise DataError('No numeric types to aggregate')
    672 
    673         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate


However, taking the mean of the column works fine, so numeric operations should be possible:

data['td'].mean()
Out[11]: 
0   00:00:00.003734
dtype: timedelta64[ns]


Obviously it's easy enough to coerce to float before doing the groupby, but I figured I might as well try to understand what I'm running into.

Edit: See https://github.com/pydata/pandas/issues/5724

ontologist · Accepted Answer

Turns out this is a pandas issue, this behavior needs to be implemented in groupby.py.

In the meantime, please enjoy this workaround that casts to float (units of seconds):

data['td'] = [10**-9 * float(td) for td in data['td']]

split-apply-combine on pandas timedelta column

Answers (2)

Related Questions