Reputation: 694
I want to compare two columns in a dataframe which may contain NaN values.
Based on link I have tried to adapt my code but am struggling with the following:
(s1[s1.notnull()] == s2[s2.notnull()]).all()
How can I adjust this code so that I do not need to extract s1 and s2 as series from the dataframe but can apply it directly:
import pandas as pd
import numpy as np
mydata = [{'SystemA': 1.78, 'SystemB':1.78},
{'SystemA': 2.5, 'SystemB':2.5},
{'SystemA': np.nan, 'SystemB':np.nan}]
df = pd.DataFrame(mydata)
print(df)
# does not return single True or False
# gives wrong result when e.g. 2.5 is changed to 2.6 in just one column
c = (df['SystemA'].notnull() == df['SystemB'].notnull()).all()
print('\nc:',c)
Upvotes: 3
Views: 11514
Reputation: 352959
The equals
method ignores nans:
>>> df
SystemA SystemB
0 1.78 1.78
1 2.50 2.50
2 NaN NaN
>>> (df["SystemA"] == df["SystemB"]).all()
False
>>> df["SystemA"].equals(df["SystemB"])
True
>>> df.iloc[1,1] = 2.6
>>> df
SystemA SystemB
0 1.78 1.78
1 2.50 2.60
2 NaN NaN
>>> df["SystemA"].equals(df["SystemB"])
False
although it also checks for dtype equality, so if you have a Series of integers and a Series of floats, it'll say they're not equal even if you have
>>> ((df.SystemA == df.SystemB) | (df.SystemA.isnull() & df.SystemB.isnull())).all()
True
This may or may not be a problem for you.
Upvotes: 12
Reputation: 8207
mydata = [{'SystemA': 1.78, 'SystemB':1.78},
{'SystemA': 2.5, 'SystemB':2.5},
{'SystemA': np.nan, 'SystemB':np.nan}]
df = pd.DataFrame(mydata)
This will get you rows where both cols are null:
df[pd.isnull(df['SystemA']) & (pd.isnull(df['SystemB']))]
SystemA SystemB
2 NaN NaN
This will get you rows where both cols are notnull:
df[(df['SystemA'].notnull()) & (df['SystemB'].notnull())]
SystemA SystemB
0 1.78 1.78
1 2.50 2.50
I'm assuming you something along the lines of one of those?
Upvotes: -1
Reputation: 210812
is that what you are looking for:
df['SystemA'].dropna() == df['SystemB'].dropna()
?
Upvotes: -2