Reputation: 541
I have a dataframe similar to below:
col1 col2 col3 col4
0 101 1000 NaN NaN
1 102 2000 51 1500
2 103 2500 52 2800
3 104 3600 53 NaN
4 105 2400 NaN NaN
5 106 3600 54 NaN
6 107 1200 55 1800
7 108 1000 NaN NaN
8 NaN NaN 56 1200
Now, I need to fill the na values in the col4 with corresponding values in col2. So, if col4 is NaN, get value from col2 instead and put it in col4.
However, the catch here is I need to do this only when col3 has some value. (Filtered Dataframe)
If I had to fill NaN values irrespective of filters, then following would work:
df['col4'].fillna(0) # If I need to fill all NaN with zero values
df['col4'].fillna(df['col2']) # if I need to fill the corresponding col2 values in place of NaN
However, how do I do that only the filtered data fills the na?
That is in the above example, only the rows 3 and 5 (corresponding to col3 values 53 and 54) should be filled with values from col2 (3600 and 3600). While, col4 for row number 0, 5 and 7 should stay NaN.
This won't work as the filtered list will be a subset of the whole column.
df[df['col3'].notnull()]['col4'].fillna(df['col2'],inplace=True) #will not work
Any way we can do this without a loop, as the dataset has more than 2 million rows?
Upvotes: 2
Views: 1102
Reputation: 24314
try via notna()
and create a boolean mask:
cond=df['col3'].notna()
Finally pass that mask and fill values conditionally by using loc
accessor and fillna()
:
df.loc[cond,'col4']=df.loc[cond,'col4'].fillna(df.loc[cond,'col2'])
Upvotes: 2