Reputation: 7203
If I have an empty DataFrame in pandas like this:
df = pandas.DataFrame(columns=['a','b','c'])
>>> df
Empty DataFrame
Columns: [a, b, c]
Index: []
and I aggregate on groups, the output will usually be an empty DataFrame:
>>> df.groupby('a', as_index=False).sum()
Empty DataFrame
Columns: [a, b, c]
Index: []
I say usually because this is not always the case. It works this way for min()
, max()
, sum()
, count()
, and quantile()
but not for mean()
, that one raises an exception:
>>> df.groupby('a', as_index=False).mean()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 666, in mean
return self._cython_agg_general('mean')
File "/usr/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 2358, in _cython_agg_general
new_items, new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
File "/usr/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 2408, in _cython_agg_blocks
raise DataError('No numeric types to aggregate')
pandas.core.groupby.DataError: No numeric types to aggregate
Why is the behavior different for this one aggregate function?
I am using pandas 0.14.1 on python 2.7.
Upvotes: 2
Views: 815
Reputation: 394031
This exception is raised for the genuine groupby functions: http://pandas.pydata.org/pandas-docs/stable/api.html#id35, when you are calling sum, this is calling the Series or df version which has no such restriction.
So in fact mean
, median
, sem
, std
, var
and ohlc
will all raise an exception.
Note also that if you had non-numerical data, the exception would be raised.
Compare what happens when you call apply with mean
:
In [18]:
df.groupby('a', as_index=False).apply(mean)
Out[18]:
Empty DataFrame
Columns: []
Index: []
here no exception is raised as the Series or Df version is being applied.
Upvotes: 1
Reputation: 11691
I'm not exactly sure but I would hypothesize it's because mean()
would divide by the number of elements in the dataframe, in this case 0. Which would cause a divide by zero error. I would just catch the error that is thrown
Upvotes: 1