Cecilia
Cecilia

Reputation: 309

Why get different results when comparing two dataframes?

I am comparing two df, it gives me False when using .equals(), but if I append two df together and use drop_duplicate() it gives me nothing. Can someone explain this?

Upvotes: 2

Views: 1592

Answers (2)

piRSquared
piRSquared

Reputation: 294228

TL;DR

These are completely different operations and I'd have never expected them to produce the same results.

pandas.DataFrame.equals

Will return a boolean value depending on whether Pandas determines that the dataframes being compared are the "same". That means that the index of one is the "same" as the index of the other, the columns of one is the "same" as the columns of the the other, and the data of one is the "same" as the data of the other.

See docs

It is NOT the same as pandas.DataFrame.eq which will return a dataframe of boolean values.

Setup

Consider these three dataframes

df0 = pd.DataFrame([[0, 1], [2, 3]], [0, 1], ['A', 'B'])
df1 = pd.DataFrame([[1, 0], [3, 2]], [0, 1], ['B', 'A'])
df2 = pd.DataFrame([[0, 1], [2, 3]], ['foo', 'bar'], ['A', 'B'])

df0              df1              df2      

   A  B             B  A               A  B
0  0  1          0  1  0          foo  0  1
1  2  3          1  3  2          bar  2  3

If we checked if df1 was equals to df0, we get

df0.equals(df1)

False

Even though all elements are the same

df0.eq(df1).all().all()

True

And that is because the columns are not aligned. If I sort the columns then ...

df0.equals(df1.sort_index(axis=1))

True

pandas.DataFrame.drop_duplicates

Compares the values in rows and doesn't care about the index.

So, both of these produce the same looking results

df0.append(df2).drop_duplicates()

and

df0.append(df1, sort=True).drop_duplicates()

   A  B
0  0  1
1  2  3

When I append (or pandas.concat), Pandas will align the columns and add the appended dataframe as new rows. Then drop_duplicates does it's thing. But it was the inherent aligning of the columns that does the what I did above with sort_index and axis=1.

Upvotes: 2

Pierre Chen
Pierre Chen

Reputation: 9

maybe the lines in both dataframes are not ordered the same way? dataframes will be equal when the lines corresponding to the same index are the same

Upvotes: 0

Related Questions