Reputation: 299
I'm dealing with time series and try to write function to calculation monthly average of data. Here are some function for prepare:
import datetime
import numpy as numpy
def date_range_0(start,end):
dates = [start + datetime.timedelta(days=i)
for i in range((end-start).days+1)]
return numpy.array(dates)
def date_range_1(start,days):
#days should be an interger
return date_range_0(start,start+datetime.timedelta(days-1))
x=date_range_1(datetime.datetime(2015, 5, 17),4)
x, the output is a simple time list:
array([datetime.datetime(2015, 5, 17, 0, 0),
datetime.datetime(2015, 5, 18, 0, 0),
datetime.datetime(2015, 5, 19, 0, 0),
datetime.datetime(2015, 5, 20, 0, 0)], dtype=object)
Then I learn groupby function from http://blog.csdn.net/youngbit007/article/details/54288603 I have tried one example in website above and it works fine:
df = pandas.DataFrame({'key1':date_range_1(datetime.datetime(2015, 1, 17),5),
'key2': [2015001,2015001,2015001,2015001,2015001],
'data1': 1+0.1*numpy.arange(1,6)
})
df
gives
data1 key1 key2
0 1.1 2015-01-17 2015001
1 1.2 2015-01-18 2015001
2 1.3 2015-01-19 2015001
3 1.4 2015-01-20 2015001
4 1.5 2015-01-21 2015001
and
grouped=df['data1'].groupby(df['key2'])
grouped.mean()
gives
key2
2015001 0.2
Name: data1, dtype: float64
Then I try my own example:
datedat=numpy.array([date_range_1(datetime.datetime(2015, 1, 17),5),1+0.1*numpy.arange(1,6)]).T
months = [day.month for day in datedat[:,0]]
years = [day.year for day in datedat[:,0]]
datedatF =
pandas.DataFrame({'key1':datedat[:,0],'key2':list((numpy.array(years)*1000 +numpy.array(months))),'data1':datedat[:,1]})
datedatF
which generated
data1 key1 key2
0 1.1 2015-01-17 2015001
1 1.2 2015-01-18 2015001
2 1.3 2015-01-19 2015001
3 1.4 2015-01-20 2015001
4 1.5 2015-01-21 2015001
Note this is exactly the very same table as above! so far so good. Then I run:
grouped2=datedatF['data1'].groupby(datedatF['key2'])
grouped2.mean()
it throw out this:
---------------------------------------------------------------------------
DataError Traceback (most recent call last)
<ipython-input-170-f0d2bc225b88> in <module>()
1 grouped2=datedatF['data1'].groupby(datedatF['key2'])
----> 2 grouped2.mean()
/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in mean(self, *args, **kwargs)
1017 nv.validate_groupby_func('mean', args, kwargs)
1018 try:
-> 1019 return self._cython_agg_general('mean')
1020 except GroupByError:
1021 raise
/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)
806
807 if len(output) == 0:
--> 808 raise DataError('No numeric types to aggregate')
809
810 return self._wrap_aggregated_output(output, names)
DataError: No numeric types to aggregate
ohh..what did I wrong?Why can't I mean the second pandas.DataFrame? It's completely same as the successful example!
Upvotes: 10
Views: 23396
Reputation: 811
dictgrp={'Company':'Goog Goog msft msft fb fb'.split(),
'Person':'Sam Charlie amy vanessa carl sarah'.split(),
'Sales':'200 130 340 124 243 350'.split()}
df4=pd.DataFrame(data=dictgrp)
print(df4)
Company Person Sales
0 Goog Sam 200
1 Goog Charlie 130
2 msft amy 340
3 msft vanessa 124
4 fb carl 243
5 fb sarah 350
grpdf=pd.to_numeric(df4['Sales']).groupby(df4['Company'])
print(grpdf.mean())
Company
Goog 165.0
fb 296.5
msft 232.0
Name: Sales, dtype: float64
Upvotes: 0
Reputation: 323226
You data1 type in your df is object , we need adding pd.to_numeric
datedatF.dtypes
Out[39]:
data1 object
key1 datetime64[ns]
key2 int64
dtype: object
grouped2=pd.to_numeric(datedatF['data1']).groupby(datedatF['key2'])
grouped2.mean()
Out[41]:
key2
2015001 1.3
Name: data1, dtype: float64
Upvotes: 9
Reputation: 210842
your data1
is of object
(string) dtype:
In [396]: datedatF.dtypes
Out[396]:
data1 object # <--- NOTE!
key1 datetime64[ns]
key2 int64
dtype: object
so try this:
In [397]: datedatF.assign(data1=pd.to_numeric(datedatF['data1'], errors='coerce')) \
.groupby('key2')['data1'].mean()
Out[397]:
key2
2015001 1.3
Name: data1, dtype: float64
Upvotes: 5