user23213128
user23213128

Reputation:

Python Pandas, find index of a rolling window - is there a more efficient method?

I have a dataframe of TimeStamps and Bid prices. For every Bid price I want to create a rolling window of previous 1000 bid prices, find the highest value on this window and return the index of where the high was located. The index must be of the rolling window, so it'll be a number between 1 and 1000. Here's the code I came up with, it works:

df['High time'] = price_frame - ( df.index.values - df['Bid'].rolling(window=price_frame).agg(lambda x: x.index.values[np.argmax(x.values)]) ) 

As stated, it works but it's highly inefficient, takes about 15/20 seconds to compute. Is there a more efficient way to do this? Thank's

I expect a faster code, if possibile.

Edit: example

I have this dataframe:

    Timestamp   Bid
0   2023-12-01 00:00:02.033000+00:00    1.26305
1   2023-12-01 00:00:05.974000+00:00    1.26306
2   2023-12-01 00:00:07.587000+00:00    1.26305
3   2023-12-01 00:00:07.689000+00:00    1.26306
4   2023-12-01 00:00:07.792000+00:00    1.26304
... ... ...
2383418 2024-01-04 23:59:36.774000+00:00    1.26810
2383419 2024-01-04 23:59:41.153000+00:00    1.26808
2383420 2024-01-04 23:59:41.256000+00:00    1.26808
2383421 2024-01-04 23:59:55.951000+00:00    1.26805
2383422 2024-01-04 23:59:58.573000+00:00    1.26807

For each row I want to create a rolling window of 1000 previous bid prices, then find the max value of that window and return the index where the high value was. Using this code:

df['High time'] =  price_frame - ( df.index.values - df['Bid'].rolling(window=price_frame).agg(lambda x: x.index.values[np.argmax(x.values)]) ) 

I obtain desired result:

    Timestamp   Bid High time
0   2023-12-01 00:00:02.033000+00:00    1.26305 NaN
1   2023-12-01 00:00:05.974000+00:00    1.26306 NaN
2   2023-12-01 00:00:07.587000+00:00    1.26305 NaN
3   2023-12-01 00:00:07.689000+00:00    1.26306 NaN
4   2023-12-01 00:00:07.792000+00:00    1.26304 NaN
... ... ... ...
2383418 2024-01-04 23:59:36.774000+00:00    1.26810 255.0
2383419 2024-01-04 23:59:41.153000+00:00    1.26808 254.0
2383420 2024-01-04 23:59:41.256000+00:00    1.26808 253.0
2383421 2024-01-04 23:59:55.951000+00:00    1.26805 252.0
2383422 2024-01-04 23:59:58.573000+00:00    1.26807 251.0

The problem is that it takes up to 20 seconds. Hence I wanted to know if there's a faster way.

Upvotes: 0

Views: 387

Answers (2)

Trostis
Trostis

Reputation: 73

Does something like this do the trick?

df['High time'] = [window.to_list().index(max(window.to_list())) for window in df['Bid'].rolling(price_frame)]

Upvotes: 0

Triky
Triky

Reputation: 441

Probably not a lot faster, but is a bit simpler. idxmax() returns the index of the max value. Unfortunately it doesn't work directly with rolling, which is why apply I used apply.

df['High time'] = df['Bid'].rolling(price_frame).apply(lambda x: x.idxmax())

Upvotes: 0

Related Questions