Reputation: 767
I have data in the following format in pandas dataframe. I would like to see the average duration of the event every 30 minutes or 1 hour.
id begin_time end_time
499078360 2019-08-30 13:26:04.124235 2019-08-30 13:42:23.680142
499074090 2019-08-30 13:21:57.685308 2019-08-30 13:39:57.993772
499084485 2019-08-30 13:32:11.533709 2019-08-30 13:45:45.307579
499088441 2019-08-30 13:36:06.971633 2019-08-30 13:48:42.160393
499088460 2019-08-30 13:36:07.935704 2019-08-30 13:48:30.037312
This is how I got the count of a number of events happening in every 5 minutes.
enter_count = df['begin_time'].value_counts()
exit_count = df["end_time"].value_counts()
df2 = pd.concat([enter_count, exit_count], axis=1, keys=["enter", "exit"])
df2.fillna(0, inplace=True)
df2["diff"] = df2["enter"] - df2["exit"]
df2["diff"] = df2["enter"] - df2["exit"]
counts = df2["diff"].resample("5min", how="sum").fillna(0).cumsum()
But I intend to get the average duration of the events in every 30 minutes or 1-hour window.
Any suggestions would be appreciated.
EDIT:
Sample response expected:
Time window Average Time of the event (minutes)
2019-08-30 13:00:00 18:10
2019-08-30 13:30:00 35:00
2019-08-30 14:00:00 17:00
This is just a sample response not exactly expected.
Upvotes: 0
Views: 1036
Reputation: 30971
Start from computing additional column - duration in minutes (expressed as float):
df['durMin'] = (df.end_time - df.begin_time) / pd.offsets.Minute()
For your sample data the result is:
id begin_time end_time durMin
0 499078360 2019-08-30 13:26:04.124235 2019-08-30 13:42:23.680142 16.325932
1 499074090 2019-08-30 13:21:57.685308 2019-08-30 13:39:57.993772 18.005141
2 499084485 2019-08-30 13:32:11.533709 2019-08-30 13:45:45.307579 13.562898
3 499088441 2019-08-30 13:36:06.971633 2019-08-30 13:48:42.160393 12.586479
4 499088460 2019-08-30 13:36:07.935704 2019-08-30 13:48:30.037312 12.368360
Then, to get the result, run:
mt = df.set_index('begin_time').durMin.resample('30min').mean()
The result is:
begin_time
2019-08-30 13:00:00 17.165536
2019-08-30 13:30:00 12.839246
Freq: 30T, Name: durMin, dtype: float64
There is a small difference from your expectation, i.e. you wanted the result formatted as mm:ss, but if you are unhappy about this detail, you may "reformat" minutes as float to your intended format.
You can do it with a single instruction:
pd.to_timedelta(mt, unit='m')
getting:
begin_time
2019-08-30 13:00:00 00:17:09.932185
2019-08-30 13:30:00 00:12:50.354746
Freq: 30T, Name: durMin, dtype: timedelta64[ns]
Upvotes: 4