Reputation: 1
I have a data.frame that contains the opening and closing times of a web application, and is grouped by id. How do I find the mean difference between opening and closing events for each id? My first instinct was to group by id, but I'm not sure what to do after
Here is a sample of the data.frame I am working with:
id event date_time_obj
1 open 14:20:24
1 close 14:24:01
2 open 14:21:36
2 close 14:27:56
1 open 14:23:20
1 close 14:25:35
I am stuck with what to do after the df.groupby()
function. I want my final df to look like this:
id avg_difference_secs
1 176 ((217+135)/2)
2 380
Upvotes: 0
Views: 38
Reputation: 4618
you could do it like this, with an example df (assuming your times are already in a datetime format or some kind of workable format):
df = pd.DataFrame({'id':[1,1,2,2,1,1,2,2],
'event':['open','close','open','close','open','close','open','close'],
'time':[1,9,2,14,2,6,12,57]})
df
id event time
0 1 open 1
1 1 close 9
2 2 open 2
3 2 close 14
4 1 open 2
5 1 close 6
6 2 open 12
7 2 close 57
df['duration'] = df['time'].diff()
avgs = df[df['event']=='close'].groupby('id').agg(np.mean)['duration']
avgs
id
1 6.0
2 28.5
EDIT - here's a more specific example using your exact df..if this doesn't work, then it's likely how you defined your df cut/copied a slice from an existing...
df
id event date_time_obj
0 1 open 14:20:24
1 1 close 14:24:01
2 2 open 14:21:36
3 2 close 14:27:56
4 1 open 14:23:20
5 1 close 14:25:35
df['date_time_obj'][0]
datetime.time(14, 20, 24) #using this format based on the info in your OP
df['seconds'] = df['date_time_obj'].apply(lambda x: x.second + x.minute*60 + x.hour*3600)
if your time is in a different format, the above step may be easier or not needed, for datetime.time there is no method to convert to seconds, and they do not support direct addition/subtraction
df
id event date_time_obj seconds
0 1 open 14:20:24 51624
1 1 close 14:24:01 51841
2 2 open 14:21:36 51696
3 2 close 14:27:56 52076
4 1 open 14:23:20 51800
5 1 close 14:25:35 51935
df['duration'] = df['seconds'].diff()
avgs = df[df['event']=='close'].groupby('id').agg(np.mean)['duration']
avgs
id
1 176.0
2 380.0
the desired output
Upvotes: 1