Reputation: 649
why do pandas NaN values sometime typed as numpy.float64, and sometimes float? This is so confusing when I want to use function and change values in a dataframe depending on other columns
example:
A B C
0 1 NaN d
1 2 a s
2 2 b s
3 3 c NaN
I have a def to change value of column C
def change_val(df):
if df.A==1 and df.B==np.nan:
return df.C
else:
return df.B
Then I apply this function onto column C
df['C']=df.apply(lambda x: change_val(x),axis=1)
Things go wrong on df.B==np.nan
, how do I correctly express this please?
Desired result:
A B C
0 1 NaN d
1 2 a a
2 2 b b
3 3 c c
Upvotes: 1
Views: 69
Reputation: 649
def change_val(df):
if df.A==1 and pd.isnull(df.B):
return df.C
else:
return df.B
NaN is no value will not be equal to any value, not even Nan itself, so use isnull()/isna()
Upvotes: 0
Reputation: 863531
Use numpy.where
or loc
, for check missing values is used special function Series.isna
:
mask = (df.A==1) & (df.B.isna())
#oldier pandas versions
#mask = (df.A==1) & (df.B.isnull())
df['C'] = np.where(mask, df.C, df.B)
Or:
df.loc[~mask, 'C'] = df.B
print (df)
A B C
0 1 NaN d
1 2 a a
2 2 b b
3 3 c c
For more information about working with missing data check docs.
Upvotes: 2