Sparkling_Sayena
Sparkling_Sayena

Reputation: 11

How to find consecutive zeros in time series

I have a data frame that its index is hourly date and its column is counts. Looks like the following table :

 date                       counts
2017-03-31 00:00:00+00:00   0.0
2017-03-31 01:00:00+00:00   0.0
2017-03-31 02:00:00+00:00   0.0
2017-03-31 03:00:00+00:00   0.0
2017-03-31 04:00:00+00:00   0.0
... ...
2022-06-19 19:00:00+00:00   6.0
2022-06-19 20:00:00+00:00   6.0
2022-06-19 21:00:00+00:00   1.0
2022-06-19 22:00:00+00:00   1.0
2022-06-19 23:00:00+00:00   1.0

If there are 15 hours worth of zero counts in a row, they are considered as error and I want to flag them. Data frame is not complete and there are missing dates(gaps) in the data.

I tried to use resampling the data frame to 15 hours and find dates with sum of resampled 15 hours are zero but didn't give me the correct answer

Upvotes: 1

Views: 793

Answers (2)

mozway
mozway

Reputation: 260640

Assuming the dates are sorted, group by successive 0 and get the group size, if ≥ 15 flag it True:

m = df['counts'].ne(0)
c = df.groupby(m.cumsum())['counts'].transform('size')

df['error'] = c.gt(15).mask(m, False)

Upvotes: 1

Code Different
Code Different

Reputation: 93161

If counts is guaranteed to be non-negative, you can use rolling and check for the max value:

df["is_error"] = df["counts"].rolling(15).max() == 0

If counts can be negative, you have to check both min and max:

r = df["counts"].rolling(15)
df["is_error"] = r.min().eq(0) & r.max().eq(0) 

Upvotes: 1

Related Questions