ericthegreat
ericthegreat

Reputation: 303

Efficient way to “look ahead” values in a pandas df

I have a pandas df containing a time series, and from t(0), I need to look ahead to t(n) and see what are the maximum and minimum values that are ahead, in a slice of size defined by columns “from” and “to”.

This is my df: enter image description here

This is my solution, which works but it is extremely slow:

df[‘max_ahead’] = df.apply(lambda x: df[‘value’][int(df[‘from’]):int(df[‘to’])].max(), axis=1)
df[‘min_ahead’] = df.apply(lambda x: df[‘value’][int(df[‘from’]):int(df[‘to’])].min(), axis=1)

Is there a way to speed this up in pandas or a numpy array? My df contains millions of rows, and the code above takes too long.

Upvotes: 3

Views: 2084

Answers (2)

BENY
BENY

Reputation: 323366

Your moving window is fix 100, then we can apply rolling here For example : for this sample data frame I choosing moving window=2

df=pd.DataFrame({'V':[1,2,3,4,5,6,7,8,9,10]})
df.rolling(window=2).min()
Out[474]: 
     V
0  NaN
1  1.0
2  2.0
3  3.0
4  4.0
5  5.0
6  6.0
7  7.0
8  8.0
9  9.0

Upvotes: 1

Luis Miguel
Luis Miguel

Reputation: 5137

Since the window to slice seems to be constant (100 in your case), try this:

df['max_ahead'] = df[value].rolling(window=100).max()
df['max_ahead'] = df['max_ahead'].shift(-100)

The shift at the end recreates the solution you want, w/o having to use apply lambda, which can be slow.

Upvotes: 2

Related Questions