draj
draj

Reputation: 71

Pandas - How to count consecutive Falses since the last occurence of True in a time series without looping?

is there a pythonic solution with pandas for the given problem?

Supposed I have masked Series called A

[False, True, False, False, False, True, False, False]

I want to get a series which counts the False values since the last occurence of True. For the example above this would output something like:

[NaN, 0, 1, 2, 3, 0, 1, 2]

And as a bonus also summed up to:

[NaN, 3, 2]

containing only the maximum lengths of all consecutive occurences of False values after a True value.

Many thanks beforehand

draj

Upvotes: 0

Views: 119

Answers (4)

Celius Stingher
Celius Stingher

Reputation: 18377

An adapation from @Andy L's answer to a dataframe:

df = pd.DataFrame({'values':[False, True, False, False, False, True, False, False]})

df['cumsum'] = (~df['values']).cumsum() - (~df['values']).cumsum().where(df['values']).ffill()
grouped = pd.concat([df.loc[df[df['values']==True].index-1,:],df.tail(1)])

Output:

    values  cumsum
0    False     NaN
1     True     0.0
2    False     1.0
3    False     2.0
4    False     3.0
5     True     0.0
6    False     1.0
7    False     2.0

Grouped output:

    values  cumsum
0    False     NaN
4    False     3.0
7    False     2.0

Upvotes: 1

Andy L.
Andy L.

Reputation: 25259

Try this

out = (~A).cumsum() - (~A).cumsum().where(A).ffill()

Out[1372]:
0    NaN
1    0.0
2    1.0
3    2.0
4    3.0
5    0.0
6    1.0
7    2.0
dtype: float64

If you want to get sum, try this from out above

out_sum = out[A.shift(-1, fill_value=True) & out.ne(0)]

Out[1411]:
0    NaN
4    3.0
7    2.0
dtype: float64

Upvotes: 3

Juan C
Juan C

Reputation: 6132

If you want to only work with Series you can adapt @kiki's answer this way:

s = pd.Series([False, True, False, False, False, True, False, False])
(s.groupby(s.cumsum()).count()-1).replace(0,np.nan).tolist()

Anyways I think that if you want to understand what's happening under the hood, @kiki answer is a bit more transparent

Output:

[nan, 3.0, 2.0]

Also, for the complete Series it's just:

(s.groupby(s.cumsum()).cumcount())

Output 2:

0    0
1    0
2    1
3    2
4    3
5    0
6    1
7    2

Please tell me if having a zero instead of a nan is a problem in the first row.

Upvotes: 3

kiki
kiki

Reputation: 603

I think the cumsum function can help you to create a kind of id at each True apparition. Then you are able to groupby and do what you need

res = pd.DataFrame([False, True, False, False, False, True, False, False],columns=['val'])
res['cumsum'] = res.val.cumsum()
res.groupby("cumsum").count() - 1

Output:

      val  
cumsum
0       0       
1       3       
2       2       

Upvotes: 1

Related Questions