Reputation: 235
Let df_1
and df_2
be:
In [1]: import pandas as pd
...: df_1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
...: df_2 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
In [2]: df_1
Out[2]:
a b
0 1 4
1 2 5
2 3 6
We add a row r
to df_1
:
In [3]: r = pd.DataFrame({'a': ['x'], 'b': ['y']})
...: df_1 = df_1.append(r, ignore_index=True)
In [4]: df_1
Out[4]:
a b
0 1 4
1 2 5
2 3 6
3 x y
We now remove the added row from df_1
and get the original df_1
back again:
In [5]: df_1 = pd.concat([df_1, r]).drop_duplicates(keep=False)
In [6]: df_1
Out[6]:
a b
0 1 4
1 2 5
2 3 6
In [7]: df_2
Out[7]:
a b
0 1 4
1 2 5
2 3 6
While df_1
and df_2
are identical, equals()
returns False
.
In [8]: df_1.equals(df_2)
Out[8]: False
Did reseach on SO but could not find a related question.
Am I doing somthing wrong? How to get the correct result in this case?
(df_1==df_2).all().all()
returns True
but not suitable for the case where df_1
and df_2
have different length.
Upvotes: 6
Views: 13593
Reputation: 33940
Use pandas.testing.assert_frame_equal(df_1, df_2, check_dtype=True)
, which will also check if the dtypes are the same.
(It will pick up in this case that your dtypes changed from int to 'object' (string) when you appended, then deleted, a string row; pandas did not automatically coerce the dtype back down to less expansive dtype.)
AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
Attribute "dtype" are different
[left]: object
[right]: int64
Upvotes: 7
Reputation: 235
Based on the comments of the others, in this case one can do:
from pandas.util.testing import assert_frame_equal
identical_df = True
try:
assert_frame_equal(df_1, df_2, check_dtype=False)
except AssertionError:
identical_df = False
Upvotes: 1
Reputation: 34046
As per df.equals
docs:
This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type, but the elements within the columns must be the same dtype.
So, df.equals
will return True
only when the elements have same values and the dtypes
is also same.
When you add and delete the row from df_1
, the dtypes
changes from int
to object
, hence it returns False
.
Explanation with your example:
In [1028]: df_1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
In [1029]: df_2 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
In [1031]: df_1.dtypes
Out[1031]:
a int64
b int64
dtype: object
In [1032]: df_2.dtypes
Out[1032]:
a int64
b int64
dtype: object
So, if you see above, dtypes
of both dfs are same, hence below condition returns True
:
In [1030]: df_1.equals(df_2)
Out[1030]: True
Now after you add and remove the row:
In [1033]: r = pd.DataFrame({'a': ['x'], 'b': ['y']})
In [1034]: df_1 = df_1.append(r, ignore_index=True)
In [1036]: df_1 = pd.concat([df_1, r]).drop_duplicates(keep=False)
In [1038]: df_1.dtypes
Out[1038]:
a object
b object
dtype: object
dtype
has changed to object
, hence below condition returns False
:
In [1039]: df_1.equals(df_2)
Out[1039]: False
True
, you need to change the dtypes
back to int
:In [1042]: df_1 = df_1.astype(int)
In [1044]: df_1.equals(df_2)
Out[1044]: True
Upvotes: 3
Reputation: 2696
This again is a subtle one, well done for spotting it.
import pandas as pd
df_1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df_2 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
r = pd.DataFrame({'a': ['x'], 'b': ['y']})
df_1 = df_1.append(r, ignore_index=True)
df_1 = pd.concat([df_1, r]).drop_duplicates(keep=False)
df_1.equals(df_2)
from pandas.util.testing import assert_frame_equal
assert_frame_equal(df_1,df_2)
Now we can see the issue as the assert fails.
AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
Attribute "dtype" are different
[left]: object
[right]: int64
as you added strings to integers the integers became objects. so this is why the equals fails as well..
Upvotes: 10