Reputation: 3907
Given a Series
, I would like to efficiently compute how many observations there are before a change occurs in the series. Here is a simple example:
ser = pd.Series([1.2,1.2,1.2,1.2,2,2,2,4,3])
print(ser)
0 1.2
1 1.2
2 1.2
3 1.2
4 2.0
5 2.0
6 2.0
7 4.0
8 3.0
I would like to apply a function to ser
which would result in:
0 4
1 3
2 2
3 1
4 3
5 2
6 1
7 1
8 1
As I am dealing with large series I would prefer a fast solution that does not involve looping.
Upvotes: 0
Views: 81
Reputation: 221654
Here's a NumPy approach based on this post
-
def array_cumcount_descending(a):
idx = np.flatnonzero(a[1:] != a[:-1])+1
shift_arr = -np.ones(a.size,dtype=int)
if len(idx)>=1:
shift_arr[0] = idx[0]
shift_arr[idx[:-1]] = idx[1:] - idx[:-1] - 1
shift_arr[idx[-1]] = a.size - idx[-1] - 1
else:
shift_arr[0] = a.size
return shift_arr.cumsum()
Sample run -
In [70]: ser
Out[70]:
0 1.2
1 1.2
2 1.2
3 1.2
4 2.0
5 2.0
6 2.0
7 4.0
8 3.0
dtype: float64
In [71]: array_cumcount_descending(ser.values)
Out[71]: array([4, 3, 2, 1, 3, 2, 1, 1, 1])
Upvotes: 1
Reputation: 30288
You can use groupby
with cumcount
:
>>> ser.groupby(ser).cumcount(ascending=False)+1
0 4
1 3
2 2
3 1
4 3
5 2
6 1
7 1
8 1
dtype: int64
As per @DSM's comment, if you have multiple blocks of the same value then above will not work but you can extend the solution with:
>>> ser = pd.Series([1.2, 1.2, 1.2, 1.2, 2, 2, 2, 1.2, 1.2, 1.2, 4, 3])
>>> ser.groupby((ser != ser.shift()).cumsum()).cumcount(ascending=False)+1
0 4
1 3
2 2
3 1
4 3
5 2
6 1
7 3
8 2
9 1
10 1
11 1
dtype: int64
Upvotes: 1