Salchipapas
Salchipapas

Reputation: 118

Comparing column value with n rows value in row slice

Assuming a dataframe:

>>> data = pd.DataFrame([[9],[5],[1],[2]])
>>> data
   0
0  9
1  5
2  1
3  2

Say I want to add a column that will compare the previous 2 or n rows and if any of those numbers are higher than the current number, write False, else True meaning no numbers in the previous 2 or n rows are higher than number at the current row.

Example:

   0  Highest
0  9   True
1  5   True
2  1   False
3  2   NaN

9 is higher than 5 and 1, 5 is higher than 1 and 2, but 1 is not higher than 2 etc etc. I need to do this with n rows, from 20 to 50+

Upvotes: 2

Views: 100

Answers (1)

Chris
Chris

Reputation: 29742

Using pandas.Series.rolling.max:

s = data[0]
data["Highest"] = s.eq(s[::-1].rolling(2).max())
print(data)

Output:

   0  Highest
0  9     True
1  5     True
2  1    False
3  2    False

Insight:

  • s[::-1]: given the OP's condition, the max comparison is done on next n items. IMO, this is same as comparing the series in a reversed manner.
  • pd.Series.rolling: provides n rolling windows calculation. In other words, creates mini batches for local comparison. It then uses max as per the OP
  • pd.Series.eq: provides element-wise comparison of self and input, so to have a boolean array whether the given element (or row) is highest.

Upvotes: 6

Related Questions