Harry
Harry

Reputation: 299

"No numeric types to aggregate" after groupby and mean

I'm dealing with time series and try to write function to calculation monthly average of data. Here are some function for prepare:

import datetime
import numpy as numpy
def date_range_0(start,end):

    dates = [start + datetime.timedelta(days=i) 
            for i in range((end-start).days+1)]
    return numpy.array(dates)
def date_range_1(start,days):
    #days should be an interger

    return date_range_0(start,start+datetime.timedelta(days-1))

x=date_range_1(datetime.datetime(2015, 5, 17),4)

x, the output is a simple time list:

array([datetime.datetime(2015, 5, 17, 0, 0),
   datetime.datetime(2015, 5, 18, 0, 0),
   datetime.datetime(2015, 5, 19, 0, 0),
   datetime.datetime(2015, 5, 20, 0, 0)], dtype=object)

Then I learn groupby function from http://blog.csdn.net/youngbit007/article/details/54288603 I have tried one example in website above and it works fine:

df = pandas.DataFrame({'key1':date_range_1(datetime.datetime(2015, 1, 17),5),
              'key2': [2015001,2015001,2015001,2015001,2015001],
              'data1': 1+0.1*numpy.arange(1,6)
        })
df

gives

   data1    key1    key2
0   1.1 2015-01-17  2015001
1   1.2 2015-01-18  2015001
2   1.3 2015-01-19  2015001
3   1.4 2015-01-20  2015001
4   1.5 2015-01-21  2015001

and

grouped=df['data1'].groupby(df['key2'])
grouped.mean()

gives

key2
2015001    0.2
Name: data1, dtype: float64

Then I try my own example:

datedat=numpy.array([date_range_1(datetime.datetime(2015, 1, 17),5),1+0.1*numpy.arange(1,6)]).T
months = [day.month for day in datedat[:,0]]
years = [day.year for day in datedat[:,0]]
datedatF = 
pandas.DataFrame({'key1':datedat[:,0],'key2':list((numpy.array(years)*1000 +numpy.array(months))),'data1':datedat[:,1]})
datedatF

which generated

   data1    key1    key2
0   1.1 2015-01-17  2015001
1   1.2 2015-01-18  2015001
2   1.3 2015-01-19  2015001
3   1.4 2015-01-20  2015001
4   1.5 2015-01-21  2015001

Note this is exactly the very same table as above! so far so good. Then I run:

grouped2=datedatF['data1'].groupby(datedatF['key2'])
grouped2.mean()

it throw out this:

   ---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-170-f0d2bc225b88> in <module>()
  1 grouped2=datedatF['data1'].groupby(datedatF['key2'])
----> 2 grouped2.mean()

/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in     mean(self, *args, **kwargs)
   1017         nv.validate_groupby_func('mean', args, kwargs)
   1018         try:
-> 1019             return self._cython_agg_general('mean')
   1020         except GroupByError:
   1021             raise

/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in     _cython_agg_general(self, how, numeric_only)
    806 
    807         if len(output) == 0:
--> 808             raise DataError('No numeric types to aggregate')
    809 
    810         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate

ohh..what did I wrong?Why can't I mean the second pandas.DataFrame? It's completely same as the successful example!

Upvotes: 10

Views: 23396

Answers (3)

sameer_nubia
sameer_nubia

Reputation: 811

Group by

dictgrp={'Company':'Goog Goog msft msft fb fb'.split(),
         'Person':'Sam Charlie amy vanessa carl sarah'.split(),
         'Sales':'200 130 340 124 243 350'.split()}

df4=pd.DataFrame(data=dictgrp)
print(df4)
  Company   Person Sales
0    Goog      Sam   200
1    Goog  Charlie   130
2    msft      amy   340
3    msft  vanessa   124
4      fb     carl   243
5      fb    sarah   350

grpdf=pd.to_numeric(df4['Sales']).groupby(df4['Company'])
print(grpdf.mean())
    Company
Goog    165.0
fb      296.5
msft    232.0
Name: Sales, dtype: float64

Upvotes: 0

BENY
BENY

Reputation: 323226

You data1 type in your df is object , we need adding pd.to_numeric

datedatF.dtypes
Out[39]: 
data1            object
key1     datetime64[ns]
key2              int64
dtype: object
grouped2=pd.to_numeric(datedatF['data1']).groupby(datedatF['key2'])
grouped2.mean()
Out[41]: 
key2
2015001    1.3
Name: data1, dtype: float64

Upvotes: 9

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

your data1 is of object (string) dtype:

In [396]: datedatF.dtypes
Out[396]:
data1            object   # <--- NOTE!
key1     datetime64[ns]
key2              int64
dtype: object

so try this:

In [397]: datedatF.assign(data1=pd.to_numeric(datedatF['data1'], errors='coerce')) \
                  .groupby('key2')['data1'].mean()
Out[397]:
key2
2015001    1.3
Name: data1, dtype: float64

Upvotes: 5

Related Questions