watchtower
watchtower

Reputation: 4298

Quirky behavior of pandas.DataFrame.equals

I have noticed a quirky thing. Let's say A and B are dataframe.

A is:

A
   a  b  c
0  x  1  a
1  y  2  b
2  z  3  c
3  w  4  d

B is:

B
   a  b  c
0  1  x  a
1  2  y  b
2  3  z  c
3  4  w  d

As we can see above, the elements under column a in A and B are different, but A.equals(B) yields True

A==B correctly shows that the elements are not equal:

A==B
       a      b     c
0  False  False  True
1  False  False  True
2  False  False  True
3  False  False  True

Question: Can someone please explain why .equals() yields True? Also, I researched this topic on SO. As per contract of pandas.DataFrame.equals, Pandas must return False. I'd appreciate any help.

I am a beginner, so I'd appreciate any help.


Here's json format and ._data of A and B

A

`A.to_json()`
Out[114]: '{"a":{"0":"x","1":"y","2":"z","3":"w"},"b":{"0":1,"1":2,"2":3,"3":4},"c":{"0":"a","1":"b","2":"c","3":"d"}}'

and A._data is

BlockManager
Items: Index(['a', 'b', 'c'], dtype='object')
Axis 1: RangeIndex(start=0, stop=4, step=1)
IntBlock: slice(1, 2, 1), 1 x 4, dtype: int64
ObjectBlock: slice(0, 4, 2), 2 x 4, dtype: object

B

B's json format:

B.to_json()
'{"a":{"0":1,"1":2,"2":3,"3":4},"b":{"0":"x","1":"y","2":"z","3":"w"},"c":{"0":"a","1":"b","2":"c","3":"d"}}'


B._data
BlockManager
Items: Index(['a', 'b', 'c'], dtype='object')
Axis 1: RangeIndex(start=0, stop=4, step=1)
IntBlock: slice(0, 1, 1), 1 x 4, dtype: int64
ObjectBlock: slice(1, 3, 1), 2 x 4, dtype: object

Upvotes: 2

Views: 653

Answers (3)

girlvsdata
girlvsdata

Reputation: 1644

Alternative to sacul and U9-Forward's answers, I've done some further analysis and it looks like the reason you are seeing True and not False as you expected might have something more to do with this line of the docs:

This function requires that the elements have the same dtype as their respective elements in the other Series or DataFrame.

dataframes

With the above dataframes, when I run df.equals(), this is what is returned:

>>> A.equals(B)
Out: True
>>> B.equals(C)
Out: False

These two align with what the other answers are saying, A and B are the same shape and have the same elements, so they are the same. While B and C have the same shape, but different elements, so they aren't the same.

On the other hand:

>>> A.equals(D)
Out: False

Here A and D have the same shape, and the same elements. But still they are returning false. The difference between this case and the one above is that all of the dtypes in the comparison match up, as it says the above docs quote. A and D both have the dtypes: str, int, str.

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71570

From the docs:

Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.

Determines if two NDFrame objects contain the same elements!!!

ELEMNTS not including COLUMNS

So that's why returns True

If you want it to return false and check the columns do:

print((A==B).all().all())

Output:

False

Upvotes: 2

sacuL
sacuL

Reputation: 51335

As in the answer you linked in your question, essentially the behaviour of pandas.DataFrame.equals mimics numpy.array_equal. The docs for np.array_equal state that it returns:

True if two arrays have the same shape and elements, False otherwise.

Which your 2 dataframes satisfies.

Upvotes: 1

Related Questions