John83
John83

Reputation: 97

Pandas: If condition on multiple columns having null values and fillna with 0

I have a below dataframe, and my requirement is that, if both columns have np.nan then no change, if either of column has empty value then fill na with 0 value. I wrote this code but why its not working. Please suggest.

import pandas as pd
import numpy as np

data = {'Age': [np.nan, np.nan, 22, np.nan, 50,99],
        'Salary': [217, np.nan, 262, 352, 570, np.nan]}
df = pd.DataFrame(data)
print(df)

cond1 = (df['Age'].isnull()) & (df['Salary'].isnull())
if cond1 is False:
    df['Age'] = df['Age'].fillna(0)
    df['Salary'] = df['Salary'].fillna(0)

print(df)

Upvotes: 1

Views: 2524

Answers (4)

sammywemmy
sammywemmy

Reputation: 28729

Get the rows that are all nulls and use where to exclude them during the fill:

bools = df.isna().all(axis = 1)
df.where(bools, df.fillna(0))
    Age  Salary
0   0.0   217.0
1   NaN     NaN
2  22.0   262.0
3   0.0   352.0
4  50.0   570.0
5  99.0     0.0

Your if statement won't work because you need to check each row for True or False; cond1 is a series, and cannot be compared correctly to False (it will just return False, which is not entirely true), there can be multiple False and True in the series.

An inefficient way would be to traverse the rows:

  for row, index in zip(cond1, df.index):
     if not row:
         df.loc[index] = df.loc[index].fillna(0)

apart from the inefficiency, you are keeping track of index positions; the pandas options try to abstract the process while being quite efficient, since the looping is in C

Upvotes: 0

Daniel Weigel
Daniel Weigel

Reputation: 1137

tmp=df.loc[(df['Age'].isna() & df['Salary'].isna())]
df.fillna(0,inplace=True)
df.loc[tmp.index]=np.nan

This might be a bit less sophisticated than the other answers but worked for me:

  • I first save the row(s) containing both Nan values at the same time
  • then fillna the original df as per normal
  • then set np.nan back to the location where we saved both rows containing Nan at the same time

Upvotes: 0

BENY
BENY

Reputation: 323376

You can just assign it with update

c = ['Age','Salary']
df.update(df.loc[~df[c].isna().all(1),c].fillna(0))

df
Out[341]: 
    Age  Salary
0   0.0   217.0
1   NaN     NaN
2  22.0   262.0
3   0.0   352.0
4  50.0   570.0
5  99.0     0.0

Upvotes: 3

Onyambu
Onyambu

Reputation: 79338

c1 = df['Age'].isna()
c2 = df['Salary'].isna()

df[np.c_[c1 & ~c2, ~c1 & c2]]=0   
df
    Age  Salary
0   0.0   217.0
1   NaN     NaN
2  22.0   262.0
3   0.0   352.0
4  50.0   570.0
5  99.0     0.0

Upvotes: 2

Related Questions