Reputation: 571
I am trying to use the where function while ignoring Nan, I do not wish to drop or replace the Nans.
Here a toy data set:
df=pd.DataFrame({
'A':[8,39,40,52],
'B':[8,39,np.nan,50],
})
Which gives:
A B
0 8 8.0
1 39 39.0
2 40 NaN
3 52 50.0
Desired result:
A B check
0 8 8.0 True
1 39 39.0 True
2 40 NaN Nan
3 52 50.0 False
I tried the following code but it did not work:
df = ((np.where(df['A']== df['B'], True, False))| df.isnull())
Upvotes: 2
Views: 1095
Reputation: 49
#Using df.apply
import pandas as pd
import numpy as np
df=pd.DataFrame({
'A':[8,39,40,52],
'B':[8,39,np.nan,50],
})
def check(df):
if not (pd.isnull(df["A"]) or pd.isnull(df["B"])):
if df["A"]==df["B"]:
return "True"
else:
return "False"
else:
return "Nan"
df["check"]=""
df["check"]=df.apply(check,axis=1)
print(df)
output:
A B check
0 8 8.0 True
1 39 39.0 True
2 40 NaN Nan
3 52 50.0 False
Upvotes: 1
Reputation: 20669
You can have mask where there are NaN
in either of the columns then insert np.nan
there using boolean masking.
m = df.isna().any(axis=1)
df['check'] = df['A'].eq(df['B'])
df.loc[m, 'check'] = np.nan # This would upcast bools to floats.
One workaround is to make check
column's dtype as object
using Series.astype
.
m = df.isna().any(axis=1)
df['check'] = df['A'].eq(df['B'])
df['check'] = df['check'].astype(object)
df.loc[m, 'check'] = np.nan
A B check
0 8 8.0 True
1 39 39.0 True
2 40 NaN NaN
3 52 50.0 False
Upvotes: 4
Reputation: 150735
Just do a where
:
df['A'].eq(df['B']).where(df[['A','B']].notna().all(1))
But then this would downcast upcast True, False
to float
.
Upvotes: 4