Reputation: 303
I have a pandas df containing a time series, and from t(0), I need to look ahead to t(n) and see what are the maximum and minimum values that are ahead, in a slice of size defined by columns “from” and “to”.
This is my solution, which works but it is extremely slow:
df[‘max_ahead’] = df.apply(lambda x: df[‘value’][int(df[‘from’]):int(df[‘to’])].max(), axis=1)
df[‘min_ahead’] = df.apply(lambda x: df[‘value’][int(df[‘from’]):int(df[‘to’])].min(), axis=1)
Is there a way to speed this up in pandas or a numpy array? My df contains millions of rows, and the code above takes too long.
Upvotes: 3
Views: 2084
Reputation: 323366
Your moving window is fix 100, then we can apply rolling
here
For example : for this sample data frame I choosing moving window=2
df=pd.DataFrame({'V':[1,2,3,4,5,6,7,8,9,10]})
df.rolling(window=2).min()
Out[474]:
V
0 NaN
1 1.0
2 2.0
3 3.0
4 4.0
5 5.0
6 6.0
7 7.0
8 8.0
9 9.0
Upvotes: 1
Reputation: 5137
Since the window to slice seems to be constant (100 in your case), try this:
df['max_ahead'] = df[value].rolling(window=100).max()
df['max_ahead'] = df['max_ahead'].shift(-100)
The shift at the end recreates the solution you want, w/o having to use apply lambda, which can be slow.
Upvotes: 2