Reputation: 989
If I run the following code:
dft1 = pd.DataFrame({'a':[1, np.nan, np.nan]})
dft2 = pd.DataFrame({'a':[1, 1, np.nan]})
dft1.a==dft2.a
The result is
0 True
1 False
2 False
Name: a, dtype: bool
How can I make the result to be
0 True
1 False
2 True
Name: a, dtype: bool
I.e., np.nan == np.nan evaluates to True.
I thought this is basic functionality and I must be asking a duplicate question, but I spent a lot of time search in SO or in Google and couldn't find it.
Upvotes: 18
Views: 4759
Reputation: 294516
np.nan
is defined to not be equal to np.nan
.
Check each pair to be equal or all np.nan
def naneq(t):
return (t[0] == t[1]) or np.isnan(t).all()
[*map(naneq, zip(dft1.a, dft2.a))]
[True, False, True]
nunique
Count the unique values. Make sure to set argument dropna=False
pd.concat([dft1, dft2], axis=1).nunique(1, 0) == 1
0 True
1 False
2 True
dtype: bool
Upvotes: 2
Reputation: 51175
Using np.isclose
with equal_nan=True
:
np.isclose(dft1, dft2, equal_nan=True, rtol=0, atol=0)
array([[ True],
[False],
[ True]])
It's important to set both atol
and rtol
to zero to avoid equality assertions on similar values.
Upvotes: 9
Reputation: 403110
Can't think of a function that already does this for you (weird) so you can just do it yourself:
dft1.eq(dft2) | (dft1.isna() & dft2.isna())
a
0 True
1 False
2 True
Note the presence of the parentheses. Precedence is a thing to watch out for when working with overloaded bitwise operators in pandas.
Another option is to use np.nan_to_num
, if you are certain the index and columns of both DataFrames are identical so this result is valid:
np.nan_to_num(dft1) == np.nan_to_num(dft2)
array([[ True],
[False],
[ True]])
np.nan_to_num
fills NaNs with some filler value (0 for numeric, 'nan' for string arrays).
Upvotes: 13
Reputation: 323376
Since np.nan is not equal to np.nan
np.nan==np.nan
Out[609]: False
dft1.a.fillna('NaN')==dft2.a.fillna('NaN')
Out[610]:
0 True
1 False
2 True
Name: a, dtype: bool
Upvotes: 5