josepmaria
josepmaria

Reputation: 571

Where function ignoring Nan

I am trying to use the where function while ignoring Nan, I do not wish to drop or replace the Nans.

Here a toy data set:

df=pd.DataFrame({
                 'A':[8,39,40,52],
                 'B':[8,39,np.nan,50],
                 })

Which gives:

    A    B
0   8   8.0
1   39  39.0
2   40  NaN
3   52  50.0

Desired result:

    A   B       check
0   8   8.0     True
1   39  39.0    True
2   40  NaN     Nan
3   52  50.0    False

I tried the following code but it did not work:

df = ((np.where(df['A']== df['B'], True, False))| df.isnull())

Upvotes: 2

Views: 1095

Answers (3)

Jay
Jay

Reputation: 49

#Using df.apply
import pandas as pd
import numpy as np

df=pd.DataFrame({
             'A':[8,39,40,52],
             'B':[8,39,np.nan,50],
             })
def check(df):

    if not (pd.isnull(df["A"]) or pd.isnull(df["B"])):
        if df["A"]==df["B"]:
            return "True"
        else:
            return "False"
    else:
        return "Nan"
df["check"]=""
df["check"]=df.apply(check,axis=1)
print(df)

output:
    A     B  check
0   8   8.0   True
1  39  39.0   True
2  40   NaN    Nan
3  52  50.0  False

Upvotes: 1

Ch3steR
Ch3steR

Reputation: 20669

You can have mask where there are NaN in either of the columns then insert np.nan there using boolean masking.

m = df.isna().any(axis=1)
df['check'] = df['A'].eq(df['B'])
df.loc[m, 'check'] = np.nan # This would upcast bools to floats.

One workaround is to make check column's dtype as object using Series.astype.

m = df.isna().any(axis=1)
df['check'] = df['A'].eq(df['B'])
df['check'] = df['check'].astype(object)
df.loc[m, 'check'] = np.nan

   A     B  check
0   8   8.0   True
1  39  39.0   True
2  40   NaN    NaN
3  52  50.0  False

Upvotes: 4

Quang Hoang
Quang Hoang

Reputation: 150735

Just do a where:

df['A'].eq(df['B']).where(df[['A','B']].notna().all(1))

But then this would downcast upcast True, False to float.

Upvotes: 4

Related Questions