Reputation: 2335
I have a Pandas (pandas==0.23.4
) datetime-indexed dataframe df
with a column named value_id
.
value_id
contains groups of float values (either 5.0
or 6.0
) and groups of NaN
. I would like to count the number of continuous groups for both 5.0
and 6.0
. The groups must contain at least three consecutive values.
For example:
In [1]: print df.value_id
timestamp
2019-01-06 17:42:08 NaN
2019-01-06 17:45:08 5.0
2019-01-06 17:48:08 5.0
2019-01-06 17:51:08 5.0
2019-01-06 17:54:08 NaN
2019-01-06 17:57:08 NaN
2019-01-06 18:00:08 NaN
2019-01-06 18:03:08 NaN
2019-01-06 18:06:08 NaN
2019-01-06 18:09:08 NaN
2019-01-06 18:12:08 6.0
2019-01-06 18:15:08 6.0
2019-01-06 19:54:09 NaN
2019-01-06 19:57:09 5.0
2019-01-06 20:00:08 5.0
2019-01-06 20:03:08 5.0
2019-01-06 20:06:09 NaN
2019-01-06 20:09:08 NaN
2019-01-06 20:12:08 NaN
2019-01-06 20:15:09 NaN
2019-01-06 20:18:08 NaN
2019-01-06 20:21:09 NaN
2019-01-06 20:24:09 NaN
2019-01-07 19:09:07 NaN
2019-01-07 19:12:06 NaN
2019-01-07 19:15:06 5.0
2019-01-07 19:18:06 5.0
2019-01-07 19:21:07 5.0
2019-01-07 19:24:07 5.0
2019-01-07 19:27:07 NaN
2019-01-07 19:30:07 NaN
2019-01-07 19:33:06 NaN
2019-01-07 19:36:07 NaN
2019-01-07 19:39:07 NaN
2019-01-07 19:42:06 NaN
2019-01-07 19:45:06 NaN
2019-01-07 19:48:06 NaN
2019-01-07 19:51:06 6.0
2019-01-07 19:54:07 6.0
2019-01-07 19:57:06 6.0
Name: value_id, dtype: float64
If I had two variables named count1
(for the 5.0 value groups) and count2
(for the 6.0 value groups), the resulting counts assigned for the above example would be:
count1
: 3
count2
: 1
Upvotes: 1
Views: 71
Reputation: 323366
IIUC create the group key with cumsum
then we just do value_counts
s.groupby(s.isnull().cumsum()).value_counts().ge(3).sum(level=1)
Out[1026]:
timestamp
5.0 3.0
6.0 1.0
Name: timestamp, dtype: float64
Upvotes: 1
Reputation: 4011
Perhaps not the most elegant, but you can use shift
to both check that the next two items are the same value, and that the previous value is not part of the same group:
df['fives'] = ((df['timestamp'] == 5) & (df['timestamp'].shift(-1) == 5)
& (df['timestamp'].shift(-2) == 5)
& (df['timestamp'].shift(1) != 5)).astype(int)
df['sixes'] = ((df['timestamp'] == 6) & (df['timestamp'].shift(-1) == 6)
& (df['timestamp'].shift(-2) == 6)
& (df['timestamp'].shift(1) != 6)).astype(int)
df[['fives','sixes']].sum()
fives 3
sixes 1
dtype: int64
Upvotes: 1