Finding the number of observations before a change in a Series occurs (pandas/numpy)

Question

Given a Series , I would like to efficiently compute how many observations there are before a change occurs in the series. Here is a simple example:

ser = pd.Series([1.2,1.2,1.2,1.2,2,2,2,4,3])

print(ser)

0    1.2
1    1.2
2    1.2
3    1.2
4    2.0
5    2.0
6    2.0
7    4.0
8    3.0

I would like to apply a function to ser which would result in:

As I am dealing with large series I would prefer a fast solution that does not involve looping.

Divakar · Accepted Answer

Here's a NumPy approach based on this post -

def array_cumcount_descending(a):
    idx = np.flatnonzero(a[1:] != a[:-1])+1
    shift_arr = -np.ones(a.size,dtype=int)

    if len(idx)>=1:
        shift_arr[0] = idx[0]
        shift_arr[idx[:-1]] = idx[1:] - idx[:-1] - 1
        shift_arr[idx[-1]] = a.size - idx[-1] - 1    
    else:
        shift_arr[0] = a.size
    return shift_arr.cumsum()

Sample run -

In [70]: ser
Out[70]: 
0    1.2
1    1.2
2    1.2
3    1.2
4    2.0
5    2.0
6    2.0
7    4.0
8    3.0
dtype: float64

In [71]: array_cumcount_descending(ser.values)
Out[71]: array([4, 3, 2, 1, 3, 2, 1, 1, 1])

Finding the number of observations before a change in a Series occurs (pandas/numpy)

Answers (2)

Related Questions