Igor K.
Igor K.

Reputation: 877

Find if value is max for previous rows

This is my DataFrame:

num val
0   77
1   78
2   78
3   79
4   80
5   81
6   79
7   83
8   85
9   86
10  87
11  88
12  89
13  90
14  91
15  90
16  92

I want to create new bool column with values True or False. This depends on if current value is maximum for previous 4 rows (larger than values in 4 previous rows).
Expected output:

num val is_max
0   77  NaN
1   78  NaN
2   78  NaN
3   79  NaN
4   80  True
5   81  True
6   79  False
7   83  True
8   85  True
9   86  True
10  87  True
11  88  True
12  89  True
13  90  True
14  91  True
15  90  False
16  92  True

Upvotes: 3

Views: 1311

Answers (3)

Since rolling also takes the current row, you may need to use shift or increase the window by one. Take a look at the respective codes below:

df['is_max'] = df.val[n:] > df.val.shift(1).rolling(n).max().dropna()

or

df['is_max'] = df.val.iloc[n:].eq(df.val.rolling(n+1).max().dropna())

where n stands by the last n rows.


Output:

    num  val is_max
0     0   77    NaN
1     1   78    NaN
2     2   78    NaN
3     3   79    NaN
4     4   80   True
5     5   81   True
6     6   79  False
7     7   83   True
8     8   85   True
9     9   86   True
10   10   87   True
11   11   88   True
12   12   89   True
13   13   90   True
14   14   91   True
15   15   90  False
16   16   92   True

Note that, the output dtype for "is_max" column is object because it has mixed data types (NaN and bool), and pandas doesn't accept it. Pandas either convert the column dtype to object or float. However, it also provides a nullable bollean data type so that you can force "is_max" column to boolean with: df.is_max.astype('boolean').

Upvotes: 2

Celius Stingher
Celius Stingher

Reputation: 18367

I believe this can be solved by evaluating the condition with the .rolling() function for the window you are calculating over. All in all the code would be as follows:

df['is_max'] = df['val'].rolling(4).max() > df['val']

Since your expected output seems to be the negation of the one I am achieving while keeping the first 3 rows as np.nan, we need to first skip the rows and then proceed to do the comparison:

df['is_max'] = np.where(df['val'].rolling(4).max().isna(),np.nan,(df['val'].rolling(4).max() > df['val']))

Given there are NaN's in the column with the True or False statement, pandas will force this True / False boolean to be converted into floats of 1 and 0 (which represents the same). Regardless of the approach you take, as soon as you add NaNs to the column, the True and False values will be forced into 1 and 0 respectively

Upvotes: 4

wwnde
wwnde

Reputation: 26676

Use groupby every 4 rows and find the expanding max.

df.val.eq(df.groupby(df.index//4).val.transform(lambda x: x.expanding().max()))

Upvotes: 2

Related Questions