Why does mean() have different behavior on empty DataFrames?

Question

If I have an empty DataFrame in pandas like this:

df = pandas.DataFrame(columns=['a','b','c'])
>>> df
Empty DataFrame
Columns: [a, b, c]
Index: []

and I aggregate on groups, the output will usually be an empty DataFrame:

>>> df.groupby('a', as_index=False).sum()
Empty DataFrame
Columns: [a, b, c]
Index: []

I say usually because this is not always the case. It works this way for min(), max(), sum(), count(), and quantile() but not for mean(), that one raises an exception:

>>> df.groupby('a', as_index=False).mean()
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 666, in mean
    return self._cython_agg_general('mean')
  File "/usr/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 2358, in _cython_agg_general
    new_items, new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 2408, in _cython_agg_blocks
    raise DataError('No numeric types to aggregate')
pandas.core.groupby.DataError: No numeric types to aggregate

Why is the behavior different for this one aggregate function?

I am using pandas 0.14.1 on python 2.7.

EdChum · Accepted Answer

This exception is raised for the genuine groupby functions: http://pandas.pydata.org/pandas-docs/stable/api.html#id35, when you are calling sum, this is calling the Series or df version which has no such restriction.

So in fact mean, median, sem, std, var and ohlc will all raise an exception.

Note also that if you had non-numerical data, the exception would be raised.

Compare what happens when you call apply with mean:

In [18]:

df.groupby('a', as_index=False).apply(mean)
Out[18]:
Empty DataFrame
Columns: []
Index: []

here no exception is raised as the Series or Df version is being applied.

Why does mean() have different behavior on empty DataFrames?

Answers (2)

Related Questions