Fill na for a filter of dataframe

Question

I have a dataframe similar to below:

     col1    col2    col3    col4
0    101     1000    NaN     NaN
1    102     2000    51      1500
2    103     2500    52      2800
3    104     3600    53      NaN
4    105     2400    NaN     NaN
5    106     3600    54      NaN
6    107     1200    55      1800
7    108     1000    NaN     NaN
8    NaN     NaN     56      1200

Now, I need to fill the na values in the col4 with corresponding values in col2. So, if col4 is NaN, get value from col2 instead and put it in col4.

However, the catch here is I need to do this only when col3 has some value. (Filtered Dataframe)

If I had to fill NaN values irrespective of filters, then following would work:

df['col4'].fillna(0) # If I need to fill all NaN with zero values
df['col4'].fillna(df['col2']) # if I need to fill the corresponding col2 values in place of NaN

However, how do I do that only the filtered data fills the na?

That is in the above example, only the rows 3 and 5 (corresponding to col3 values 53 and 54) should be filled with values from col2 (3600 and 3600). While, col4 for row number 0, 5 and 7 should stay NaN.

This won't work as the filtered list will be a subset of the whole column.

df[df['col3'].notnull()]['col4'].fillna(df['col2'],inplace=True) #will not work

Any way we can do this without a loop, as the dataset has more than 2 million rows?

Anurag Dabas · Accepted Answer

try via notna() and create a boolean mask:

cond=df['col3'].notna()

Finally pass that mask and fill values conditionally by using loc accessor and fillna():

df.loc[cond,'col4']=df.loc[cond,'col4'].fillna(df.loc[cond,'col2'])

Fill na for a filter of dataframe

Answers (1)

Related Questions