Reputation: 4298
I have noticed a quirky thing. Let's say A and B are dataframe.
A is:
A
a b c
0 x 1 a
1 y 2 b
2 z 3 c
3 w 4 d
B is:
B
a b c
0 1 x a
1 2 y b
2 3 z c
3 4 w d
As we can see above, the elements under column a
in A
and B
are different, but A.equals(B)
yields True
A==B
correctly shows that the elements are not equal:
A==B
a b c
0 False False True
1 False False True
2 False False True
3 False False True
Question: Can someone please explain why .equals()
yields True
? Also, I researched this topic on SO. As per contract of pandas.DataFrame.equals, Pandas
must return False
. I'd appreciate any help.
I am a beginner, so I'd appreciate any help.
Here's json
format and ._data
of A and B
A
`A.to_json()`
Out[114]: '{"a":{"0":"x","1":"y","2":"z","3":"w"},"b":{"0":1,"1":2,"2":3,"3":4},"c":{"0":"a","1":"b","2":"c","3":"d"}}'
and A._data
is
BlockManager
Items: Index(['a', 'b', 'c'], dtype='object')
Axis 1: RangeIndex(start=0, stop=4, step=1)
IntBlock: slice(1, 2, 1), 1 x 4, dtype: int64
ObjectBlock: slice(0, 4, 2), 2 x 4, dtype: object
B
B's json format:
B.to_json()
'{"a":{"0":1,"1":2,"2":3,"3":4},"b":{"0":"x","1":"y","2":"z","3":"w"},"c":{"0":"a","1":"b","2":"c","3":"d"}}'
B._data
BlockManager
Items: Index(['a', 'b', 'c'], dtype='object')
Axis 1: RangeIndex(start=0, stop=4, step=1)
IntBlock: slice(0, 1, 1), 1 x 4, dtype: int64
ObjectBlock: slice(1, 3, 1), 2 x 4, dtype: object
Upvotes: 2
Views: 653
Reputation: 1644
Alternative to sacul and U9-Forward's answers, I've done some further analysis and it looks like the reason you are seeing True
and not False
as you expected might have something more to do with this line of the docs:
This function requires that the elements have the same dtype as their respective elements in the other Series or DataFrame.
With the above dataframes, when I run df.equals()
, this is what is returned:
>>> A.equals(B)
Out: True
>>> B.equals(C)
Out: False
These two align with what the other answers are saying, A
and B
are the same shape and have the same elements, so they are the same. While B
and C
have the same shape, but different elements, so they aren't the same.
On the other hand:
>>> A.equals(D)
Out: False
Here A
and D
have the same shape, and the same elements. But still they are returning false. The difference between this case and the one above is that all of the dtypes
in the comparison match up, as it says the above docs quote. A
and D
both have the dtypes
: str, int, str.
Upvotes: 1
Reputation: 71570
From the docs:
Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.
Determines if two NDFrame objects contain the same elements!!!
ELEMNTS not including COLUMNS
So that's why returns True
If you want it to return false and check the columns do:
print((A==B).all().all())
Output:
False
Upvotes: 2
Reputation: 51335
As in the answer you linked in your question, essentially the behaviour of pandas.DataFrame.equals
mimics numpy.array_equal
.
The docs for np.array_equal
state that it returns:
True if two arrays have the same shape and elements, False otherwise.
Which your 2 dataframes satisfies.
Upvotes: 1