Eoin
Eoin

Reputation: 595

Mean of Pandas TimeSeries using .groupby()

Hi,

I have some continuous x/y coordinates from a behavioural experiment, that I would like to average within groups using Pandas.

I'm using a subset of the data here.

data
Out[11]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2036 entries, 0 to 1623
Data columns (total 9 columns):
id               2036  non-null values
subject          2036  non-null values
code             2036  non-null values
acc              2036  non-null values
nx               2036  non-null values
ny               2036  non-null values
rx               2036  non-null values
ry               2036  non-null values
reaction_time    2036  non-null values
dtypes: bool(1), int64(3), object(5)

nx and ny hold a series of TimeSeries objects, all of which have the same indices.

data.nx.iloc[0]
Out[16]: 
0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
...
86     1.019901
87     1.010000
88     1.010000
89     1.005921
90     1.000000
91     1.000000
92     1.000000
93     1.000000
94     1.000000
95     1.000000
96     1.000000
97     1.000000
98     1.000000
99     1.000000
100    1.000000
Length: 101, dtype: float64

These TimeSeries columns can be average normally, using data.nx.mean(), and behave as expected, but I hit trouble when I try to group the data.

grouped = data.groupby(['code', 'acc'])
means = grouped.mean()
print means
                       id          subject  reaction_time
code   acc                                               
group1 False  1570.866667  47474992.333333    1506.000000
       True   1337.076152  46022403.623246    1322.116232
group2 False  1338.180180  48730402.045045    1289.112613
       True   1382.631757  42713592.628378    1294.952703
group3 False  1488.587156  43202477.623853    1349.568807
       True   1310.415233  47054310.498771    1341.837838
group4 False  1339.682540  52530349.936508    1540.714286
       True   1343.261176  44606616.407059    1362.174118

Strangely, I can force them to average the TimeSeries data, and may have to fall back on hacking this way, like so:

for name, group in grouped:
     print group.nx.mean()

0     0.000000
1     0.000000
2     0.000000
3     0.000000
4     0.000000
5     0.000667
6     0.000683
7     0.001952
8     0.002000
9     0.002000

{etc, 101 values for 6 groups}

Finally, if I try to force the GroupBy object to average them, I get the following:

grouped.nx.mean()
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-25-0b536a966e02> in <module>()
----> 1 grouped.nx.mean()

/usr/local/lib/python2.7/dist-packages/pandas-0.12.0-py2.7-linux-i686.egg/pandas/core/groupby.pyc in mean(self)
    357         """
    358         try:
--> 359             return self._cython_agg_general('mean')
    360         except GroupByError:
    361             raise

/usr/local/lib/python2.7/dist-packages/pandas-0.12.0-py2.7-linux-i686.egg/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
    462 
    463         if len(output) == 0:
--> 464             raise DataError('No numeric types to aggregate')
    465 
    466         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate

Has anyone any ideas?

Upvotes: 1

Views: 1168

Answers (1)

Dan Allan
Dan Allan

Reputation: 35265

A Series where each entry is itself a Series is not idiomatic. I think "No numeric types to aggregate" is telling you that pandas is trying to take the average of a list of Series (not the average of the numeric data they contain) which is not defined.

You should organize your data so nx and ny contain actual numbers. It might be simplest to keep nx, ny, (and, I think, rx and ry) in a separate DataFrame, where each column corresponds to one id.

Upvotes: 4

Related Questions