Reputation: 359
I'm looking to compare two dataframes which should be identical. However due to floating point precision I am being told the values don't match. I have created an example to simulate it below. How can I get the correct result so the final comparison dataframe returns true for both cells?
a = pd.DataFrame({'A':[100,97.35000000001]})
b = pd.DataFrame({'A':[100,97.34999999999]})
print a
A
0 100.00
1 97.35
print b
A
0 100.00
1 97.35
print (a == b)
A
0 True
1 False
Upvotes: 21
Views: 11208
Reputation: 1474
You can use Pandas built-in assert_frame_equal, that automagically performs the numpy isclose() for floating point columns. The advantage is that you can pass an entire dataframe with mixed column types.
For fine tuning see arguments rtol and atol.
from pandas.testing import assert_frame_equal
assert_frame_equal(df1, df2)
Upvotes: 3
Reputation: 394459
OK you can use np.isclose
for this:
In [250]:
np.isclose(a,b)
Out[250]:
array([[ True],
[ True]], dtype=bool)
np.isclose
takes relative tolerance and absolute tolerance. These have default values: rtol=1e-05
, atol=1e-08
respectively
Upvotes: 22