Pandas NaN value causing trouble when change values depending on other columns

Question

why do pandas NaN values sometime typed as numpy.float64, and sometimes float? This is so confusing when I want to use function and change values in a dataframe depending on other columns

example:

   A    B    C
0  1  NaN    d
1  2    a    s
2  2    b    s
3  3    c  NaN

I have a def to change value of column C

def change_val(df):
    if df.A==1 and df.B==np.nan:
        return df.C
    else:
        return df.B

Then I apply this function onto column C

df['C']=df.apply(lambda x: change_val(x),axis=1)

Things go wrong on df.B==np.nan, how do I correctly express this please?

Desired result:

   A    B    C
0  1  NaN    d
1  2    a    a
2  2    b    b
3  3    c    c

jezrael · Accepted Answer

Use numpy.where or loc, for check missing values is used special function Series.isna:

mask = (df.A==1) & (df.B.isna())
#oldier pandas versions
#mask = (df.A==1) & (df.B.isnull())
df['C'] = np.where(mask, df.C, df.B)

Or:

df.loc[~mask, 'C'] = df.B

print (df)
   A    B  C
0  1  NaN  d
1  2    a  a
2  2    b  b
3  3    c  c

For more information about working with missing data check docs.

Pandas NaN value causing trouble when change values depending on other columns

Answers (2)

Related Questions