Issue with Groupby function in Pandas

Question

I am trying to take a dataframe, which has timestamps and various other fields, and group by rounded timestamps (to the nearest minute), and take the average of another field. I'm getting the error no numeric value to aggregate

I am rounding the timestamp column like such:

df['time'] = df['time'].dt.round('1min')

The aggregation column is of the form: 0 days 00:00:00.054000

df3 = (
    df
    .groupby([df['time']])['total_diff'].mean()
    .unstack(fill_value=0)
    .reset_index()
)

I realize the total_diff column is a time delta field, but I would have thought this would still be considered numerical?

My ideal output would have the following columns: Rounded timestamp, number of records that are grouped in that timestamp, average total_diff by rounded timestamp. How can I achieve this?

EDIT Example row:

[index, id, time, total_diff]
[400, 5fdfe9242c2fb0da04928d55, 2020-12-21 00:16:00, 0 days 00:00:00.055000]
[401, 5fdfe9242c2fb0da04928d56, 2020-12-21 00:16:00, 0 days 00:00:00.01000]
[402, 5fdfe9242c2fb0da04928d57, 2020-12-21 00:15:00, 0 days 00:00:00.05000]

The time column is not unique. I want to group by the time column, count the number of rows that are grouped into each time bucket, and produce an average of the total_diff for each time bucket.

Desired outcome:

[time, count, avg_total_diff]
[2020-12-21 00:16:00, 2, .0325]

Issue with Groupby function in Pandas

Answers (1)

Sample Data

Code

Related Questions