Reputation: 189
I have a dataset of 6 milion rows, the columns are: symbol
, timeStamp
, open price
and close price
. I run the following loop, which takes very long, though being very simple (if open price
is nan
, take close price
from the previous row):
for i in range(0,len(price2)):
print(i)
if np.isnan(price3.iloc[i,2]):
price3.iloc[i,2]=price3.iloc[i-1,3]
How can I speed this loop up? As far as I know, I can change to apply()
, but how can I include the if-condition to it?
Upvotes: 3
Views: 69
Reputation: 29710
Instead of the for loop, you can use pandas.Series.fillna
with the shifted Series for the close price.
price3['open price'].fillna(price3['close price'].shift(1), inplace=True)
This is vectorized and so should be far faster than your for loop.
Note I am assuming that price2
and price3
have the same length and you may as well be iterating over price3
in your loop.
Upvotes: 3