pjw
pjw

Reputation: 2335

Count groups of values in Pandas series

I have a Pandas (pandas==0.23.4) datetime-indexed dataframe df with a column named value_id.

value_id contains groups of float values (either 5.0 or 6.0) and groups of NaN. I would like to count the number of continuous groups for both 5.0 and 6.0. The groups must contain at least three consecutive values.

For example:

In [1]: print df.value_id
timestamp
2019-01-06 17:42:08    NaN
2019-01-06 17:45:08    5.0
2019-01-06 17:48:08    5.0
2019-01-06 17:51:08    5.0
2019-01-06 17:54:08    NaN
2019-01-06 17:57:08    NaN
2019-01-06 18:00:08    NaN
2019-01-06 18:03:08    NaN
2019-01-06 18:06:08    NaN
2019-01-06 18:09:08    NaN
2019-01-06 18:12:08    6.0
2019-01-06 18:15:08    6.0
2019-01-06 19:54:09    NaN
2019-01-06 19:57:09    5.0
2019-01-06 20:00:08    5.0
2019-01-06 20:03:08    5.0
2019-01-06 20:06:09    NaN
2019-01-06 20:09:08    NaN
2019-01-06 20:12:08    NaN
2019-01-06 20:15:09    NaN
2019-01-06 20:18:08    NaN
2019-01-06 20:21:09    NaN
2019-01-06 20:24:09    NaN
2019-01-07 19:09:07    NaN
2019-01-07 19:12:06    NaN
2019-01-07 19:15:06    5.0
2019-01-07 19:18:06    5.0
2019-01-07 19:21:07    5.0
2019-01-07 19:24:07    5.0
2019-01-07 19:27:07    NaN
2019-01-07 19:30:07    NaN
2019-01-07 19:33:06    NaN
2019-01-07 19:36:07    NaN
2019-01-07 19:39:07    NaN
2019-01-07 19:42:06    NaN
2019-01-07 19:45:06    NaN
2019-01-07 19:48:06    NaN
2019-01-07 19:51:06    6.0
2019-01-07 19:54:07    6.0
2019-01-07 19:57:06    6.0
Name: value_id, dtype: float64

If I had two variables named count1 (for the 5.0 value groups) and count2 (for the 6.0 value groups), the resulting counts assigned for the above example would be:

count1: 3

count2: 1

Upvotes: 1

Views: 71

Answers (2)

BENY
BENY

Reputation: 323366

IIUC create the group key with cumsum then we just do value_counts

s.groupby(s.isnull().cumsum()).value_counts().ge(3).sum(level=1)
Out[1026]: 
timestamp
5.0    3.0
6.0    1.0
Name: timestamp, dtype: float64

Upvotes: 1

Brendan
Brendan

Reputation: 4011

Perhaps not the most elegant, but you can use shift to both check that the next two items are the same value, and that the previous value is not part of the same group:

df['fives'] = ((df['timestamp'] == 5) & (df['timestamp'].shift(-1) == 5)
                & (df['timestamp'].shift(-2) == 5)
                & (df['timestamp'].shift(1) != 5)).astype(int)
df['sixes'] = ((df['timestamp'] == 6) & (df['timestamp'].shift(-1) == 6)
                & (df['timestamp'].shift(-2) == 6)
                & (df['timestamp'].shift(1) != 6)).astype(int)

df[['fives','sixes']].sum()
fives    3
sixes    1
dtype: int64

Upvotes: 1

Related Questions