Reputation: 789
I have a pandas series as pd.Series([-1, -1, -1, 0, 0, 0, -5, -5, 0, 0, 0, -1, -1, -1 , -1])
. How can I convert it in to pd.Series([-1, 0, 0, 0, -5, -5, 0, 0, 0, -1])
.
The condition to filter is that if -1
s are more than or equal to 3
in a streak, then keep the first occurrence and discard the rest.
Since the first -1
s streak is 3
, we keep -1
and discard the rest. After the first 3
values, the streak breaks (since the value is now 0
). Similarly the last -1
s streak is 4
, so we keep the -1
and discard the rest.
The filter only applies to -1
and -5
should be left as is
Thanks
PS: I thought about groupby, but I think it doesnt honor the streak
way that I described above
Upvotes: 2
Views: 1706
Reputation: 25269
Create a boolean mask m
to identify positions where values change. Groupby
s on m.cumsum()
with transform to identify groups having number of -1
< 3 and assign it to mask m1
. Boolean m or
m1 and cumsum to separate only groups-with-number -1
>= 3 into the same number. Finally, use duplicated
to slice.
m = s.diff().ne(0)
m1 = s.groupby(m.cumsum()).transform(lambda x: x.eq(-1).sum() < 3)
m2 = ~((m | m1).cumsum().duplicated())
s[m2]
Step by step:
I modify your sample to include case -1
have 2 consecutive rows which we should keep.
s
Out[148]:
0 -1
1 -1
2 -1
3 0
4 -1
5 -1
6 0
7 0
8 -5
9 -5
10 0
11 0
12 0
13 -1
14 -1
15 -1
16 -1
dtype: int64
m = s.diff().ne(0)
Out[150]:
0 True
1 False
2 False
3 True
4 True
5 False
6 True
7 False
8 True
9 False
10 True
11 False
12 False
13 True
14 False
15 False
16 False
dtype: bool
m1 = s.groupby(m.cumsum()).transform(lambda x: x.eq(-1).sum() < 3)
Out[152]:
0 False
1 False
2 False
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
13 False
14 False
15 False
16 False
dtype: bool
m2 = ~((m | m1).cumsum().duplicated())
Out[159]:
0 True
1 False
2 False
3 True
4 True
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
13 True
14 False
15 False
16 False
dtype: bool
In [168]: s[m2]
Out[168]:
0 -1
3 0
4 -1
5 -1
6 0
7 0
8 -5
9 -5
10 0
11 0
12 0
13 -1
dtype: int64
Upvotes: 0
Reputation: 221774
With some SciPy tools -
from scipy.ndimage.morphology import binary_opening,binary_erosion
def keep_first_neg1s(s, W=3):
k1 = np.ones(W,dtype=bool)
k2 = np.ones(2,dtype=bool)
m = s==-1
return s[~binary_erosion(binary_opening(m,k1),k2) | ~m]
A simpler one and hopefully more performant too -
def keep_first_neg1s_v2(s, W=3):
m1 = binary_opening(a==-1, np.ones(W,dtype=bool))
return s[np.r_[True,~m1[:-1]]]
Runs on given sample s
-
# Using .tolist() simply for better visualization
In [47]: s.tolist()
Out[47]: [-1, -1, -1, 0, 0, 0, -5, -5, 0, 0, 0, -1, -1, -1, -1]
In [48]: keep_first_neg1s(s,W=3).tolist()
Out[48]: [-1, 0, 0, 0, -5, -5, 0, 0, 0, -1]
In [49]: keep_first_neg1s(s,W=4).tolist()
Out[49]: [-1, -1, -1, 0, 0, 0, -5, -5, 0, 0, 0, -1]
Upvotes: 2
Reputation: 59304
IIUC, pandas masking and groupby:
def remove_streaks(T):
'''T is the threshold
'''
g = s.groupby(s.diff().ne(0).cumsum() + s.ne(-1).cumsum())
mask = g.transform('size').lt(T).cumsum() + s.diff().ne(0).cumsum()
return s.groupby(mask).first()
>>> remove_streaks(4)
[-1, -1, -1, 0, 0, 0, -5, -5, 0, 0, 0, -1]
>>> remove_streaks(3)
[-1, 0, 0, 0, -5, -5, 0, 0, 0, -1]
Upvotes: 1
Reputation: 92904
With conditional mask:
In [43]: s = pd.Series([-1, -1, -1, 0, 0, 0, -5, -5, 0, 0, 0, -1, -1, -1 , -1])
In [44]: m = (s.diff() == 0) & (s.eq(-1))
In [45]: s[~m]
Out[45]:
0 -1
3 0
4 0
5 0
6 -5
7 -5
8 0
9 0
10 0
11 -1
dtype: int64
Upvotes: 2