jokol
jokol

Reputation: 383

Create a new column in Pandas Dataframe based on the 'NaN' values in other columns

What is the most efficient way to create a new column based off of nan values in a separate column (considering the dataframe is very large) In OTW, If any column has a NaN in one of the rows, the corresponding value of the new column should be 1

Note: The dtypes of the column may be different objects, not just integers/floats

X A   B
1 2   3    
4 NaN 1    
7 8   9    
3 2   NaN  
5 NaN 2   

Should give

X A   B    C
1 2   3    0
4 NaN 1    1
7 8   9    0
3 2   NaN  1
5 NaN 2    1

Code Tried (Thanks to some online help):

df['C'] = np.where(np.any(np.isnan(df[['A', 'B']])), 1, 0)

but it throws the following error

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

And this returns an empty dataframe (Since both A and B columns never have NaN values in a single row

df['C'] = np.where(np.any(pd.isnull(df[['A', 'B']])), 1, 0)

Found a Workaround :

df['C1'] = np.where(np.isnan(df['A'].values), 1, 0) 
df['C2'] = np.where(np.isnan(df['B'].values), 1, 0)
df['C'] = df[['C1','C2']].max(axis=1)

You may then drop C1 and C2

Hope this helps~

Upvotes: 5

Views: 5179

Answers (2)

GIRISH kuniyal
GIRISH kuniyal

Reputation: 770

This is simple than you think. Hope this may Help you!

df['C'] = df.isna().sum(axis=1).apply(lambda x: 0 if x==0 else 1)

Upvotes: 6

BENY
BENY

Reputation: 323236

You are missing the axis=1 in any

np.where(np.any(np.isnan(df[['A', 'B']]),axis=1), 1, 0)
Out[80]: array([0, 1, 0, 1, 1])

Upvotes: 1

Related Questions