Reputation: 37
I have this dataframe:
In[1]df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]])
In[2]df
Out[2]:
0 1 2 3 4
0 1 2 3 4 5
1 6 7 8 9 10
2 11 12 13 14 15
3 16 17 18 19 20
4 21 22 23 24 25
I need to achieve this:
The resulting dataframe df1 should be same size with True of False in it based on the above stated criteria:
In[3]df1
Out[3]:
0 1 2 3 4
0 NaN NaN False False False
1 NaN NaN False False False
2 NaN NaN True True True
3 NaN NaN True True True
4 NaN NaN True True True
I figured dataframe.rolling.apply() with a function might be the solution, but how exactly?
Upvotes: 2
Views: 16950
Reputation: 1343
You are right that using rolling()
is the way to go. However, you must keep in mind since rolling()
replaces the value at end of the window with the new value, so you can not just mark the window with True
you will also get False
whenever the condition is not applicable
Here is the code that uses your sample dataframe and performs the desired transformation:
df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]])
now, defining a function that takes a window as an argument and returns whether the condition is satisfied
def fun(x):
num = 0
for i in x:
num += 1 if i > 10 else 0
return 1 if num >= 2 else -1
I have hardcoded the threshold as 10. So if in any window the numbers of values greater than 10 are greater than or equal to 2 than the last value is replaced by 1 (denoting True), else it is replaced by -1(denoting False).
If you want to keep threshold parameters as variables, then have a look at this answer to pass them as arguments.
Now applying the function on rolling window, using window size as 3, axis 1 and additionally if you don't want NaN then you can also set min_periods to 1 in the arguments.
df.rolling(3, axis=1).apply(fun)
produces the output as
0 1 2 3 4
0 NaN NaN -1.0 -1.0 -1.0
1 NaN NaN -1.0 -1.0 -1.0
2 NaN NaN 1.0 1.0 1.0
3 NaN NaN 1.0 1.0 1.0
4 NaN NaN 1.0 1.0 1.0
Upvotes: 6
Reputation: 294258
Use sum
on a boolean dataframe.
df.gt(10).rolling(3, axis=1).sum().ge(2)
0 1 2 3 4
0 False False False False False
1 False False False False False
2 False False True True True
3 False False True True True
4 False False True True True
You can nail down the exact requested output by masking where na.
df.gt(10).rolling(3, axis=1).sum().pipe(lambda d: d.ge(2).mask(d.isna()))
0 1 2 3 4
0 NaN NaN False False False
1 NaN NaN False False False
2 NaN NaN True True True
3 NaN NaN True True True
4 NaN NaN True True True
Upvotes: 3
Reputation: 9081
You need -
import pandas as pd
import numpy as np
df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]])
df1 = df.apply(lambda x: pd.Series([np.nan, np.nan]+[all(j>10 for j in i) for i in zip(x[0::1], x[1::1], x[2::1])]), axis=1)
print(df1)
Output
0 1 2 3 4
0 NaN NaN False False False
1 NaN NaN False False False
2 NaN NaN True True True
3 NaN NaN True True True
4 NaN NaN True True True
Explanation
list(zip(x[0::1], x[1::1], x[2::1])
breaks it down to taking 3 columns at a time for every row -
0 [(1, 2, 3), (2, 3, 4), (3, 4, 5)]
1 [(6, 7, 8), (7, 8, 9), (8, 9, 10)]
2 [(11, 12, 13), (12, 13, 14), (13, 14, 15)]
3 [(16, 17, 18), (17, 18, 19), (18, 19, 20)]
4 [(21, 22, 23), (22, 23, 24), (23, 24, 25)]
all(j>10 for j in i)
Checks for each element in the list of tuples and then outputs True
if all the elements in the tuple are greater than 10
Concatenating [np.nan, np.nan]
to match your output. Hope that helps.
Upvotes: 0