Reputation: 885
I am trying to take a dataframe, which has timestamps and various other fields, and group by rounded timestamps (to the nearest minute), and take the average of another field. I'm getting the error no numeric value to aggregate
I am rounding the timestamp column like such:
df['time'] = df['time'].dt.round('1min')
The aggregation column is of the form: 0 days 00:00:00.054000
df3 = (
df
.groupby([df['time']])['total_diff'].mean()
.unstack(fill_value=0)
.reset_index()
)
I realize the total_diff column is a time delta field, but I would have thought this would still be considered numerical?
My ideal output would have the following columns: Rounded timestamp, number of records that are grouped in that timestamp, average total_diff by rounded timestamp. How can I achieve this?
EDIT Example row:
[index, id, time, total_diff]
[400, 5fdfe9242c2fb0da04928d55, 2020-12-21 00:16:00, 0 days 00:00:00.055000]
[401, 5fdfe9242c2fb0da04928d56, 2020-12-21 00:16:00, 0 days 00:00:00.01000]
[402, 5fdfe9242c2fb0da04928d57, 2020-12-21 00:15:00, 0 days 00:00:00.05000]
The time column is not unique. I want to group by the time column, count the number of rows that are grouped into each time bucket, and produce an average of the total_diff for each time bucket.
Desired outcome:
[time, count, avg_total_diff]
[2020-12-21 00:16:00, 2, .0325]
Upvotes: 0
Views: 6901
Reputation: 59579
By default DataFrame.groupby.mean
has numeric_only=True
, and numeric only considers int
, bool
and float
. To also work with timedelta64[ns]
you must set this to False
.
import pandas as pd
df = pd.DataFrame(pd.date_range('2010-01-01', freq='2T', periods=6))
df[1] = df[0].diff().bfill()
# 0 1
#0 2010-01-01 00:00:00 0 days 00:02:00
#1 2010-01-01 00:02:00 0 days 00:02:00
#2 2010-01-01 00:04:00 0 days 00:02:00
#3 2010-01-01 00:06:00 0 days 00:02:00
#4 2010-01-01 00:08:00 0 days 00:02:00
#5 2010-01-01 00:10:00 0 days 00:02:00
df.dtypes
#0 datetime64[ns]
#1 timedelta64[ns]
#dtype: object
df.groupby(df[0].round('5T'))[1].mean()
#DataError: No numeric types to aggregate
df.groupby(df[0].round('5T'))[1].mean(numeric_only=False)
#0
#2010-01-01 00:00:00 0 days 00:02:00
#2010-01-01 00:05:00 0 days 00:02:00
#2010-01-01 00:10:00 0 days 00:02:00
#Name: 1, dtype: timedelta64[ns]
Upvotes: 3