Reputation: 383
What is the most efficient way to create a new column based off of nan values in a separate column (considering the dataframe is very large)
In OTW, If any column has a NaN
in one of the rows, the corresponding value of the new column should be 1
Note: The dtypes of the column may be different objects, not just integers/floats
X A B
1 2 3
4 NaN 1
7 8 9
3 2 NaN
5 NaN 2
Should give
X A B C
1 2 3 0
4 NaN 1 1
7 8 9 0
3 2 NaN 1
5 NaN 2 1
Code Tried (Thanks to some online help):
df['C'] = np.where(np.any(np.isnan(df[['A', 'B']])), 1, 0)
but it throws the following error
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
And this returns an empty dataframe (Since both A and B columns never have NaN values in a single row
df['C'] = np.where(np.any(pd.isnull(df[['A', 'B']])), 1, 0)
Found a Workaround :
df['C1'] = np.where(np.isnan(df['A'].values), 1, 0)
df['C2'] = np.where(np.isnan(df['B'].values), 1, 0)
df['C'] = df[['C1','C2']].max(axis=1)
You may then drop C1
and C2
Hope this helps~
Upvotes: 5
Views: 5179
Reputation: 770
This is simple than you think. Hope this may Help you!
df['C'] = df.isna().sum(axis=1).apply(lambda x: 0 if x==0 else 1)
Upvotes: 6
Reputation: 323236
You are missing the axis=1
in any
np.where(np.any(np.isnan(df[['A', 'B']]),axis=1), 1, 0)
Out[80]: array([0, 1, 0, 1, 1])
Upvotes: 1