Reputation: 118
Assuming a dataframe:
>>> data = pd.DataFrame([[9],[5],[1],[2]])
>>> data
0
0 9
1 5
2 1
3 2
Say I want to add a column that will compare the previous 2 or n rows and if any of those numbers are higher than the current number, write False, else True meaning no numbers in the previous 2 or n rows are higher than number at the current row.
Example:
0 Highest
0 9 True
1 5 True
2 1 False
3 2 NaN
9 is higher than 5 and 1, 5 is higher than 1 and 2, but 1 is not higher than 2 etc etc. I need to do this with n rows, from 20 to 50+
Upvotes: 2
Views: 100
Reputation: 29742
Using pandas.Series.rolling.max
:
s = data[0]
data["Highest"] = s.eq(s[::-1].rolling(2).max())
print(data)
Output:
0 Highest
0 9 True
1 5 True
2 1 False
3 2 False
Insight:
s[::-1]
: given the OP's condition, the max comparison is done on next n items. IMO, this is same as comparing the series in a reversed manner.pd.Series.rolling
: provides n
rolling windows calculation. In other words, creates mini batches for local comparison. It then uses max
as per the OPpd.Series.eq
: provides element-wise comparison of self
and input
, so to have a boolean array whether the given element (or row) is highest.Upvotes: 6