splinter
splinter

Reputation: 3907

Finding the number of observations before a change in a Series occurs (pandas/numpy)

Given a Series , I would like to efficiently compute how many observations there are before a change occurs in the series. Here is a simple example:

ser = pd.Series([1.2,1.2,1.2,1.2,2,2,2,4,3])

print(ser)

0    1.2
1    1.2
2    1.2
3    1.2
4    2.0
5    2.0
6    2.0
7    4.0
8    3.0

I would like to apply a function to ser which would result in:

0    4
1    3
2    2
3    1
4    3
5    2
6    1
7    1
8    1

As I am dealing with large series I would prefer a fast solution that does not involve looping.

Upvotes: 0

Views: 81

Answers (2)

Divakar
Divakar

Reputation: 221654

Here's a NumPy approach based on this post -

def array_cumcount_descending(a):
    idx = np.flatnonzero(a[1:] != a[:-1])+1
    shift_arr = -np.ones(a.size,dtype=int)

    if len(idx)>=1:
        shift_arr[0] = idx[0]
        shift_arr[idx[:-1]] = idx[1:] - idx[:-1] - 1
        shift_arr[idx[-1]] = a.size - idx[-1] - 1    
    else:
        shift_arr[0] = a.size
    return shift_arr.cumsum()

Sample run -

In [70]: ser
Out[70]: 
0    1.2
1    1.2
2    1.2
3    1.2
4    2.0
5    2.0
6    2.0
7    4.0
8    3.0
dtype: float64

In [71]: array_cumcount_descending(ser.values)
Out[71]: array([4, 3, 2, 1, 3, 2, 1, 1, 1])

Upvotes: 1

AChampion
AChampion

Reputation: 30288

You can use groupby with cumcount:

>>> ser.groupby(ser).cumcount(ascending=False)+1
0    4
1    3
2    2
3    1
4    3
5    2
6    1
7    1
8    1
dtype: int64

As per @DSM's comment, if you have multiple blocks of the same value then above will not work but you can extend the solution with:

>>> ser = pd.Series([1.2, 1.2, 1.2, 1.2, 2, 2, 2, 1.2, 1.2, 1.2, 4, 3])
>>> ser.groupby((ser != ser.shift()).cumsum()).cumcount(ascending=False)+1
0     4
1     3
2     2
3     1
4     3
5     2
6     1
7     3
8     2
9     1
10    1
11    1
dtype: int64

Upvotes: 1

Related Questions