Speeding up row-by-row loop with if-condition in Python

Question

I have a dataset of 6 milion rows, the columns are: symbol, timeStamp, open price and close price. I run the following loop, which takes very long, though being very simple (if open price is nan, take close price from the previous row):

for i in range(0,len(price2)):
    print(i)
    if np.isnan(price3.iloc[i,2]):
        price3.iloc[i,2]=price3.iloc[i-1,3]

How can I speed this loop up? As far as I know, I can change to apply(), but how can I include the if-condition to it?

miradulo · Accepted Answer

Instead of the for loop, you can use pandas.Series.fillna with the shifted Series for the close price.

price3['open price'].fillna(price3['close price'].shift(1), inplace=True)

This is vectorized and so should be far faster than your for loop.

Note I am assuming that price2 and price3 have the same length and you may as well be iterating over price3 in your loop.

Speeding up row-by-row loop with if-condition in Python

Answers (1)

Related Questions