Reputation: 11
I have a data frame that its index is hourly date and its column is counts. Looks like the following table :
date counts
2017-03-31 00:00:00+00:00 0.0
2017-03-31 01:00:00+00:00 0.0
2017-03-31 02:00:00+00:00 0.0
2017-03-31 03:00:00+00:00 0.0
2017-03-31 04:00:00+00:00 0.0
... ...
2022-06-19 19:00:00+00:00 6.0
2022-06-19 20:00:00+00:00 6.0
2022-06-19 21:00:00+00:00 1.0
2022-06-19 22:00:00+00:00 1.0
2022-06-19 23:00:00+00:00 1.0
If there are 15 hours worth of zero counts in a row, they are considered as error and I want to flag them. Data frame is not complete and there are missing dates(gaps) in the data.
I tried to use resampling the data frame to 15 hours and find dates with sum of resampled 15 hours are zero but didn't give me the correct answer
Upvotes: 1
Views: 793
Reputation: 260640
Assuming the dates are sorted, group by successive 0 and get the group size, if ≥ 15 flag it True:
m = df['counts'].ne(0)
c = df.groupby(m.cumsum())['counts'].transform('size')
df['error'] = c.gt(15).mask(m, False)
Upvotes: 1
Reputation: 93161
If counts
is guaranteed to be non-negative, you can use rolling
and check for the max value:
df["is_error"] = df["counts"].rolling(15).max() == 0
If counts
can be negative, you have to check both min and max:
r = df["counts"].rolling(15)
df["is_error"] = r.min().eq(0) & r.max().eq(0)
Upvotes: 1