Reputation: 73
I have this dataframe in pandas:
col1 col2
0 1 -0.5
1 2 -4.0
2 8 3.0
3 9 6.0
4 4 345.0
5 6 -7.0
6 7 3456.0
7 47 3.0
8 2 5.0
9 4 -78.0
I want to return only these rows, where the "col2" values of the previous 3 rows are greater than 0. In this case:
5 6 -7.0
and
9 4 -78.0
I have tried it with rolling but cant find a way. Can you help me, please? :)
Upvotes: 2
Views: 464
Reputation: 23217
You can use .rolling(3)
on col2
to look at the window of 3 entries and then use gt(0).all()
to check all these 3 entries being > 0. As you want to check for the previous 3 entries not including current entry, we further use .shift()
. Then use .loc
to locate such entries:
df.loc[df['col2'].rolling(3).apply(lambda x: x.gt(0).all()).shift() > 0]
Another soluton without using .apply()
, as inspired by Tom, is as follows:
df.loc[df['col2'].shift().gt(0).rolling(3).min().eq(1)]
Here, we use .rolling(3).min().eq(1)
on the boolean series with gt(0)
to check that all 3 entries have minimum of 1 (meaning that all entries must be True
, since True
is translated to 1
in calculations and False
translated to 0
). Effectively, we get the same effects as gt(0).all()
in previous solution.
This notion has the advantage that no matter what's the rolling window size, we still check for min().eq(1)
without requiring to adjust this part of code when the rolling window size is changed.
Result:
col1 col2
5 6 -7.0
9 4 -78.0
Upvotes: 3
Reputation: 8790
Very similar to the answer by SeaBean, but without using the apply
:
>>> df[df['col2'].shift().gt(0).rolling(3).sum().eq(3)]
col1 col2
5 6 -7.0
9 4 -78.0
You use shift
so you don't have to consider the previous 3 rows, rather each row and the 2 behind it. Then make a mask of positive values (.gt(0)
), and with a rolling window of 3 over that mask, check which groups have a sum of 3.
Upvotes: 5