d9ngle
d9ngle

Reputation: 1469

Find highest value among the next K-consecutive rows in Pandas?

I'm working with a time-series price data and I want to know how high price can reach in the next K-rows for every row.

I can think of achieving it somehow using .argmax() while filtering dataframe based on time, however there must be a simpler built-in solution.

For example:

  Price
 1 $10
 2 $11
 3 $15
 4 $18
 5 $13
 6 $4
 7 $25

For K=2, here's what I want:

  Price   Highest_In_Next_2_Rows
1 $10     $15
2 $11     $18
3 $15     $18
4 $18     $13
5 $13     $25
6 $4      $25
7 $25     NaN

Upvotes: 7

Views: 4236

Answers (1)

FLab
FLab

Reputation: 7466

You can achieve this using pandas rolling and shift function.

Essentially you find the rolling max over the previous k observations and then you shift the series by k, so that the max for t is the one calculated over (t+1, ..., t+k).

import pandas as pd
import numpy as np

ts = pd.Series([10, 11, 15, 18, 13, 4, 25])
k = 2

res = ts.rolling(k).max().shift(-k)

pd.concat([ts, res], axis = 1) 

output:

#     0     1
# 0  10  15.0
# 1  11  18.0
# 2  15  18.0
# 3  18  13.0
# 4  13  25.0
# 5   4   NaN
# 6  25   NaN

The problem of this solution is that it doesn't give results for the last k observations.

A workaround is the following: You consider the series in reverse order and calculate the rolling max over the past k observations (giving results when there is at least one observation). Then you lag by one day as you don't want today's price to be included and you reverse again to go back to the original order.

res = ts[::-1].rolling(k,1).max().shift(1)[::-1]

Which replicates exactly the desired output:

#    0     1
#0  10  15.0
#1  11  18.0
#2  15  18.0
#3  18  13.0
#4  13  25.0
#5   4  25.0
#6  25   NaN

Upvotes: 13

Related Questions