Efficient way to “look ahead” values in a pandas df

Question

I have a pandas df containing a time series, and from t(0), I need to look ahead to t(n) and see what are the maximum and minimum values that are ahead, in a slice of size defined by columns “from” and “to”.

This is my df:

This is my solution, which works but it is extremely slow:

df[‘max_ahead’] = df.apply(lambda x: df[‘value’][int(df[‘from’]):int(df[‘to’])].max(), axis=1)
df[‘min_ahead’] = df.apply(lambda x: df[‘value’][int(df[‘from’]):int(df[‘to’])].min(), axis=1)

Is there a way to speed this up in pandas or a numpy array? My df contains millions of rows, and the code above takes too long.

Luis Miguel · Accepted Answer

Since the window to slice seems to be constant (100 in your case), try this:

df['max_ahead'] = df[value].rolling(window=100).max()
df['max_ahead'] = df['max_ahead'].shift(-100)

The shift at the end recreates the solution you want, w/o having to use apply lambda, which can be slow.

Efficient way to “look ahead” values in a pandas df

Answers (2)

Related Questions