Common Sense
Common Sense

Reputation: 85

How to find the time difference between two events in groups

There is a data frame which contains the following information detector id, the channel id (each detector has some channels) and the timestamp (let it be an integer for simplicity) and a number of counts that occurred in a given (detector_id, channel_id) pair.

How to calculate the number of days passed since the last nonzero event in the given (detector_id, channel_id) pair?

Here is an example:

df = pd.DataFrame({
    "time": [1, 1, 2, 3, 3, 4, 4],
    "detector_id": [0, 0, 0, 0, 0, 0, 1],
    "channel_id": [0, 0, 1, 0, 1, 1, 1],
    "counts": [0, 1, 0, 1, 0, 1, 0],
})

I tried to solve this in the following way:

df["diff"] = df["time"] - df.groupby(["detector_id", "channel_id"])['time'].diff()

It produces the following result:

   time  detector_id  channel_id  counts  diff  expected
0     1            0           1       0   NaN       NaN
1     2            0           1       1   1.0       NaN
2     3            0           1       0   2.0       1.0
3     4            0           0       1   NaN       NaN
4     5            0           1       0   3.0       3.0
5     6            0           1       1   5.0       4.0
6     7            1           1       0   NaN       NaN

As you can see the given solution doesn't take into account counts column. We should set a difference to zero once we see counts > 0 and propagate otherwise.

Upvotes: 3

Views: 203

Answers (1)

Peter Leimbigler
Peter Leimbigler

Reputation: 11105

This should be close, but needs testing on your full data:

def f(subdf):      
    ffilled = (subdf.loc[subdf['counts'] > 0, 'time']
               .reindex_like(subdf)
               .ffill()
               .shift())
    return subdf['time'] - ffilled

df['diff'] = (df.groupby(['detector_id', 'channel_id'])
                .apply(f)
                .sort_index(level=-1)
                .values)

   time  detector_id  channel_id  counts  diff  expected
0     1            0           1       0   NaN       NaN
1     2            0           1       1   NaN       NaN
2     3            0           1       0   1.0       1.0
3     4            0           0       1   NaN       NaN
4     5            0           1       0   3.0       3.0
5     6            0           1       1   4.0       4.0
6     7            1           1       0   NaN       NaN

Upvotes: 2

Related Questions