Nani
Nani

Reputation: 260

Pandas Grouper calculate time elapsed between events

I am trying to find the time elapsed between two events using Grouper but was unable to do so. Please help me out. Below is the i/p & expected o/p

Input

ID   Status           Datetime
A    Online     24/09/2017  7:00:00 AM
A    Offline    24/09/2017  7:30:00 AM     
A    Offline    24/09/2017  8:30:00 AM
A    Online     24/09/2017  9:30:00 AM
A    Offline    24/09/2017  10:00:00 AM
B    Offline    24/09/2017  6:00:00 AM
B    Online     24/09/2017  7:30:00 AM     
B    Online     24/09/2017  9:10:00 AM
B    Offline    24/09/2017  9:30:00 AM
B    Online     24/09/2017  9:40:00 AM
B    Offline    24/09/2017  10:00:00 AM

Output

ID        Hour_start                  Hour_end              Online_time
A    24/09/2017  7:00:00 AM     24/09/2017  8:00:00 AM          1800
A    24/09/2017  8:00:00 AM     24/09/2017  9:00:00 AM           0
A    24/09/2017  9:00:00 AM     24/09/2017  10:00:00 AM         1800
B    24/09/2017  6:00:00 AM     24/09/2017  7:00:00 AM           0
B    24/09/2017  7:00:00 AM     24/09/2017  8:00:00 AM          1800
B    24/09/2017  8:00:00 AM     24/09/2017  9:00:00 AM          3600
B    24/09/2017  9:00:00 AM     24/09/2017  10:00:00 AM         3000

Using Pandas Grouper

df_output = df.groupby(['ID',pd.Grouper(key='Datetime', freq='H'),'status'])['event_time'].diff().dt.seconds.fillna(0)

But this doesn't take into the condition of Online & Offline of the Status column

Please help me out. TIA

Upvotes: 0

Views: 358

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30971

I assume that Datetime column in your source DataFrame is of datetime64 type.

My solution is based on 2-level grouping, first by ID and then (after some intermediate operations) by hour.

Define 2 functions:

  1. onTimeById, to compute Online time for each ID (the "external" grouping level):

    def onTimeById(grp):
        wrk = grp[grp.Status != grp.Status.shift()]
        wrk = wrk.set_index('Datetime').Status
        wrk = wrk.reindex(wrk.index.union(pd.date_range(wrk.index.min(),
            wrk.index.max(), freq='H'))).ffill()
        res = wrk.groupby(pd.Grouper(freq='H')).apply(onTimeByHour)
        rv = res.iloc[:-1].reset_index().rename(columns={'index': 'Hour_start',
            'Status': 'Online_time'})
        rv.insert(1, 'Hour_end', res.index[1:])
        return rv
    
  2. onTimeByHour, to compute Online time for each hour (the "internal" grouping level):

    def onTimeByHour(grp2):
        if grp2.size > 1:
            dd = grp2.index.to_series().diff()
            rv = dd[grp2 == 'Offline'].sum().seconds
            if grp2.iloc[-1] == 'Online':
                rv += 3600 - dd.sum().seconds
            return rv
        return 0 if grp2.iloc[0] == 'Offline' else 3600
    

Then run:

res = df.groupby('ID').apply(onTimeById).reset_index(level=0).reset_index(drop=True)

The result, for your source data, is:

  ID          Hour_start            Hour_end  Online_time
0  A 2017-09-24 07:00:00 2017-09-24 08:00:00         1800
1  A 2017-09-24 08:00:00 2017-09-24 09:00:00            0
2  A 2017-09-24 09:00:00 2017-09-24 10:00:00         1800
3  B 2017-09-24 06:00:00 2017-09-24 07:00:00            0
4  B 2017-09-24 07:00:00 2017-09-24 08:00:00         1800
5  B 2017-09-24 08:00:00 2017-09-24 09:00:00         3600
6  B 2017-09-24 09:00:00 2017-09-24 10:00:00         3000

To understand all details how this solution works, save group "A" under a variable, e.g. running:

gr = df.groupby('ID')
grp = gr.get_group('A')

Then execute each instruction from onTimeById and see the results.

Apply the same approach to trace how onTimeByHour works.

Upvotes: 1

Related Questions