Reputation: 51
so I have a part of my code I want to optimize
nan_rows = df.loc[df.Open.isna()].index
for i in nan_rows:
df.Open.iloc[i] = df.Close.iloc[i-1]
What it do is it assigns nan values with the previous value of another column. I find this code to be slow and often times I have to apply this method to bigger dataframes. Is there any way to optimize this? Thank you
Upvotes: 0
Views: 51
Reputation: 1413
IIUC, this might work. Even with multiple recurring NaN values in 'Open':
import pandas as pd
# sample dataset for read_clipboard()
'''
Close Open
1.0 1.0
2.0 NaN
3.0 3.0
4.0 NaN
5.0 NaN
6.0 NaN
7.0 7.0
8.0 NaN
'''
df = pd.read_clipboard()
# print(df)
df input:
Close Open
0 1.0 1.0
1 2.0 NaN
2 3.0 3.0
3 4.0 NaN
4 5.0 NaN
5 6.0 NaN
6 7.0 7.0
7 8.0 NaN
.
df['Open'] = df['Open'].fillna(df['Close'].shift(1))
# print(df)
df output:
Close Open
0 1.0 1.0
1 2.0 1.0
2 3.0 3.0
3 4.0 3.0
4 5.0 4.0
5 6.0 5.0
6 7.0 7.0
7 8.0 7.0
Upvotes: 2