Nishant ranjan
Nishant ranjan

Reputation: 93

Pandas numpy.where() use - Not getting the desired result

I am trying to merge two columns into a third column based on the NaN values

df['code2'] = np.where(df['code']==np.nan, df['code'], df['code1'])

I am getting only the values if code1 column in the code2. The result is coming as shown in the image Output image

enter image description here

Please tell me what is wrong in the code i am writing. Thanks

Upvotes: 3

Views: 4036

Answers (2)

Alexander
Alexander

Reputation: 109756

The correct way to check if a value is nan is to use np.isnan(val):

np.nan == np.nan
False

np.isnan(np.nan)
True
df = pd.DataFrame({'a': [np.nan, 1, 2]})

>>> np.isnan(df.a)
0     True
1    False
2    False
Name: a, dtype: bool

Upvotes: 0

jezrael
jezrael

Reputation: 863791

I think you need isnull for comparing NaN:

df['code2'] = np.where(df['code'].isnull(), df['code'], df['code1'])

Docs:

Warning

One has to be mindful that in python (and numpy), the nan's don’t compare equal, but None's do. Note that Pandas/numpy uses the fact that np.nan != np.nan, and treats None like np.nan.

In [11]: None == None
Out[11]: True

In [12]: np.nan == np.nan
Out[12]: False

So as compared to above, a scalar equality comparison versus a None/np.nan doesn’t provide useful information.

In [13]: df2['one'] == np.nan
Out[13]: 
a    False
b    False
c    False
d    False
e    False
f    False
g    False
h    False
Name: one, dtype: bool

Upvotes: 7

Related Questions