Reputation: 85
There is a data frame which contains the following information detector id, the channel id (each detector has some channels) and the timestamp (let it be an integer for simplicity) and a number of counts that occurred in a given (detector_id
, channel_id
) pair.
How to calculate the number of days passed since the last nonzero event in the given (detector_id
, channel_id
) pair?
Here is an example:
df = pd.DataFrame({
"time": [1, 1, 2, 3, 3, 4, 4],
"detector_id": [0, 0, 0, 0, 0, 0, 1],
"channel_id": [0, 0, 1, 0, 1, 1, 1],
"counts": [0, 1, 0, 1, 0, 1, 0],
})
I tried to solve this in the following way:
df["diff"] = df["time"] - df.groupby(["detector_id", "channel_id"])['time'].diff()
It produces the following result:
time detector_id channel_id counts diff expected
0 1 0 1 0 NaN NaN
1 2 0 1 1 1.0 NaN
2 3 0 1 0 2.0 1.0
3 4 0 0 1 NaN NaN
4 5 0 1 0 3.0 3.0
5 6 0 1 1 5.0 4.0
6 7 1 1 0 NaN NaN
As you can see the given solution doesn't take into account counts
column. We should set a difference to zero once we see counts > 0
and propagate otherwise.
Upvotes: 3
Views: 203
Reputation: 11105
This should be close, but needs testing on your full data:
def f(subdf):
ffilled = (subdf.loc[subdf['counts'] > 0, 'time']
.reindex_like(subdf)
.ffill()
.shift())
return subdf['time'] - ffilled
df['diff'] = (df.groupby(['detector_id', 'channel_id'])
.apply(f)
.sort_index(level=-1)
.values)
time detector_id channel_id counts diff expected
0 1 0 1 0 NaN NaN
1 2 0 1 1 NaN NaN
2 3 0 1 0 1.0 1.0
3 4 0 0 1 NaN NaN
4 5 0 1 0 3.0 3.0
5 6 0 1 1 4.0 4.0
6 7 1 1 0 NaN NaN
Upvotes: 2