Reputation: 71
is there a pythonic solution with pandas for the given problem?
Supposed I have masked Series called A
[False, True, False, False, False, True, False, False]
I want to get a series which counts the False values since the last occurence of True. For the example above this would output something like:
[NaN, 0, 1, 2, 3, 0, 1, 2]
And as a bonus also summed up to:
[NaN, 3, 2]
containing only the maximum lengths of all consecutive occurences of False values after a True value.
Many thanks beforehand
draj
Upvotes: 0
Views: 119
Reputation: 18377
An adapation from @Andy L's answer to a dataframe:
df = pd.DataFrame({'values':[False, True, False, False, False, True, False, False]})
df['cumsum'] = (~df['values']).cumsum() - (~df['values']).cumsum().where(df['values']).ffill()
grouped = pd.concat([df.loc[df[df['values']==True].index-1,:],df.tail(1)])
Output:
values cumsum
0 False NaN
1 True 0.0
2 False 1.0
3 False 2.0
4 False 3.0
5 True 0.0
6 False 1.0
7 False 2.0
Grouped output:
values cumsum
0 False NaN
4 False 3.0
7 False 2.0
Upvotes: 1
Reputation: 25259
Try this
out = (~A).cumsum() - (~A).cumsum().where(A).ffill()
Out[1372]:
0 NaN
1 0.0
2 1.0
3 2.0
4 3.0
5 0.0
6 1.0
7 2.0
dtype: float64
If you want to get sum, try this from out
above
out_sum = out[A.shift(-1, fill_value=True) & out.ne(0)]
Out[1411]:
0 NaN
4 3.0
7 2.0
dtype: float64
Upvotes: 3
Reputation: 6132
If you want to only work with Series you can adapt @kiki's answer this way:
s = pd.Series([False, True, False, False, False, True, False, False])
(s.groupby(s.cumsum()).count()-1).replace(0,np.nan).tolist()
Anyways I think that if you want to understand what's happening under the hood, @kiki answer is a bit more transparent
[nan, 3.0, 2.0]
Also, for the complete Series it's just:
(s.groupby(s.cumsum()).cumcount())
0 0
1 0
2 1
3 2
4 3
5 0
6 1
7 2
Please tell me if having a zero instead of a nan
is a problem in the first row.
Upvotes: 3
Reputation: 603
I think the cumsum function can help you to create a kind of id at each True apparition. Then you are able to groupby and do what you need
res = pd.DataFrame([False, True, False, False, False, True, False, False],columns=['val'])
res['cumsum'] = res.val.cumsum()
res.groupby("cumsum").count() - 1
Output:
val
cumsum
0 0
1 3
2 2
Upvotes: 1