Reputation: 877
This is my DataFrame:
num val
0 77
1 78
2 78
3 79
4 80
5 81
6 79
7 83
8 85
9 86
10 87
11 88
12 89
13 90
14 91
15 90
16 92
I want to create new bool column with values True or False. This depends on if current value is maximum for previous 4 rows (larger than values in 4 previous rows).
Expected output:
num val is_max
0 77 NaN
1 78 NaN
2 78 NaN
3 79 NaN
4 80 True
5 81 True
6 79 False
7 83 True
8 85 True
9 86 True
10 87 True
11 88 True
12 89 True
13 90 True
14 91 True
15 90 False
16 92 True
Upvotes: 3
Views: 1311
Reputation: 4929
Since rolling
also takes the current row, you may need to use shift
or increase the window by one. Take a look at the respective codes below:
df['is_max'] = df.val[n:] > df.val.shift(1).rolling(n).max().dropna()
or
df['is_max'] = df.val.iloc[n:].eq(df.val.rolling(n+1).max().dropna())
where n
stands by the last n
rows.
Output:
num val is_max
0 0 77 NaN
1 1 78 NaN
2 2 78 NaN
3 3 79 NaN
4 4 80 True
5 5 81 True
6 6 79 False
7 7 83 True
8 8 85 True
9 9 86 True
10 10 87 True
11 11 88 True
12 12 89 True
13 13 90 True
14 14 91 True
15 15 90 False
16 16 92 True
Note that, the output dtype for "is_max" column is object
because it has mixed data types (NaN
and bool
), and pandas doesn't accept it. Pandas either convert the column dtype to object
or float
. However, it also provides a nullable bollean data type so that you can force "is_max" column to boolean
with: df.is_max.astype('boolean')
.
Upvotes: 2
Reputation: 18367
I believe this can be solved by evaluating the condition with the .rolling()
function for the window you are calculating over. All in all the code would be as follows:
df['is_max'] = df['val'].rolling(4).max() > df['val']
Since your expected output seems to be the negation of the one I am achieving while keeping the first 3 rows as np.nan, we need to first skip the rows and then proceed to do the comparison:
df['is_max'] = np.where(df['val'].rolling(4).max().isna(),np.nan,(df['val'].rolling(4).max() > df['val']))
Given there are NaN's in the column with the True or False statement, pandas will force this True / False boolean to be converted into floats of 1 and 0 (which represents the same). Regardless of the approach you take, as soon as you add NaN
s to the column, the True and False values will be forced into 1 and 0 respectively
Upvotes: 4
Reputation: 26676
Use groupby every 4 rows and find the expanding max.
df.val.eq(df.groupby(df.index//4).val.transform(lambda x: x.expanding().max()))
Upvotes: 2