statboii
statboii

Reputation: 1

Python - Find mean difference between two events in a column grouping by a third column

I have a data.frame that contains the opening and closing times of a web application, and is grouped by id. How do I find the mean difference between opening and closing events for each id? My first instinct was to group by id, but I'm not sure what to do after

Here is a sample of the data.frame I am working with:

id     event date_time_obj
 1     open   14:20:24
 1     close  14:24:01
 2     open   14:21:36
 2     close  14:27:56
 1     open   14:23:20
 1     close  14:25:35

I am stuck with what to do after the df.groupby() function. I want my final df to look like this:

id  avg_difference_secs
1    176 ((217+135)/2)
2    380

Upvotes: 0

Views: 38

Answers (1)

Derek Eden
Derek Eden

Reputation: 4618

you could do it like this, with an example df (assuming your times are already in a datetime format or some kind of workable format):

df = pd.DataFrame({'id':[1,1,2,2,1,1,2,2],
                   'event':['open','close','open','close','open','close','open','close'],
                   'time':[1,9,2,14,2,6,12,57]})
df

       id  event  time
    0   1   open     1
    1   1  close     9
    2   2   open     2
    3   2  close    14
    4   1   open     2
    5   1  close     6
    6   2   open    12
    7   2  close    57

df['duration'] = df['time'].diff()
avgs = df[df['event']=='close'].groupby('id').agg(np.mean)['duration']

avgs

id
1     6.0
2    28.5

EDIT - here's a more specific example using your exact df..if this doesn't work, then it's likely how you defined your df cut/copied a slice from an existing...

df

   id  event date_time_obj
0   1   open      14:20:24
1   1  close      14:24:01
2   2   open      14:21:36
3   2  close      14:27:56
4   1   open      14:23:20
5   1  close      14:25:35

df['date_time_obj'][0]

datetime.time(14, 20, 24) #using this format based on the info in your OP

df['seconds'] = df['date_time_obj'].apply(lambda x: x.second + x.minute*60 + x.hour*3600)

if your time is in a different format, the above step may be easier or not needed, for datetime.time there is no method to convert to seconds, and they do not support direct addition/subtraction

df

   id  event date_time_obj  seconds
0   1   open      14:20:24    51624
1   1  close      14:24:01    51841
2   2   open      14:21:36    51696
3   2  close      14:27:56    52076
4   1   open      14:23:20    51800
5   1  close      14:25:35    51935

df['duration'] = df['seconds'].diff()
avgs = df[df['event']=='close'].groupby('id').agg(np.mean)['duration']

avgs

id
1    176.0
2    380.0

the desired output

Upvotes: 1

Related Questions