Count groups of values in Pandas series

Question

I have a Pandas (pandas==0.23.4) datetime-indexed dataframe df with a column named value_id.

value_id contains groups of float values (either 5.0 or 6.0) and groups of NaN. I would like to count the number of continuous groups for both 5.0 and 6.0. The groups must contain at least three consecutive values.

For example:

In [1]: print df.value_id
timestamp
2019-01-06 17:42:08    NaN
2019-01-06 17:45:08    5.0
2019-01-06 17:48:08    5.0
2019-01-06 17:51:08    5.0
2019-01-06 17:54:08    NaN
2019-01-06 17:57:08    NaN
2019-01-06 18:00:08    NaN
2019-01-06 18:03:08    NaN
2019-01-06 18:06:08    NaN
2019-01-06 18:09:08    NaN
2019-01-06 18:12:08    6.0
2019-01-06 18:15:08    6.0
2019-01-06 19:54:09    NaN
2019-01-06 19:57:09    5.0
2019-01-06 20:00:08    5.0
2019-01-06 20:03:08    5.0
2019-01-06 20:06:09    NaN
2019-01-06 20:09:08    NaN
2019-01-06 20:12:08    NaN
2019-01-06 20:15:09    NaN
2019-01-06 20:18:08    NaN
2019-01-06 20:21:09    NaN
2019-01-06 20:24:09    NaN
2019-01-07 19:09:07    NaN
2019-01-07 19:12:06    NaN
2019-01-07 19:15:06    5.0
2019-01-07 19:18:06    5.0
2019-01-07 19:21:07    5.0
2019-01-07 19:24:07    5.0
2019-01-07 19:27:07    NaN
2019-01-07 19:30:07    NaN
2019-01-07 19:33:06    NaN
2019-01-07 19:36:07    NaN
2019-01-07 19:39:07    NaN
2019-01-07 19:42:06    NaN
2019-01-07 19:45:06    NaN
2019-01-07 19:48:06    NaN
2019-01-07 19:51:06    6.0
2019-01-07 19:54:07    6.0
2019-01-07 19:57:06    6.0
Name: value_id, dtype: float64

If I had two variables named count1 (for the 5.0 value groups) and count2 (for the 6.0 value groups), the resulting counts assigned for the above example would be:

count1: 3

count2: 1

BENY · Accepted Answer

IIUC create the group key with cumsum then we just do value_counts

s.groupby(s.isnull().cumsum()).value_counts().ge(3).sum(level=1)
Out[1026]: 
timestamp
5.0    3.0
6.0    1.0
Name: timestamp, dtype: float64

Count groups of values in Pandas series

Answers (2)

Related Questions