I.Koropetskyi
I.Koropetskyi

Reputation: 31

Aggregations for Timedelta values in the Python DataFrame

I have big DataFrame (df) which looks like:

  Acc_num date_diff
0   29  0:04:43
1   29  0:01:43
2   29  2:22:45
3   29  0:16:21
4   29  0:58:20
5   30  0:00:35
6   34  7:15:26
7   34  4:40:01
8   34  0:56:02
9   34  6:53:44
10  34  1:36:58
.....
Acc_num                    int64
date_diff        timedelta64[ns]
dtype: object

I need to calculate 'date_diff' mean (in timedelta format) for each account number.
df.date_diff.mean() works correctly. But when I try next:
df.groupby('Acc_num').date_diff.mean() it raises an exception:

"DataError: No numeric types to aggregate"

I also tried df.pivot_table() method, but didn't acheive anything.

Could someone help me with this stuff. Thank you in advance!

Upvotes: 3

Views: 3288

Answers (1)

Adrien Matissart
Adrien Matissart

Reputation: 1690

Weird limitation indeed. But a simple solution would be:

df.groupby('Acc_num').date_diff.agg(lambda g:g.sum()/g.count())

Edit:
Pandas will actually attempt to aggregate non-numeric columns if you pass numeric_only=False

df.groupby('Acc_num').date_diff.mean(numeric_only=False)

Upvotes: 7

Related Questions