Reputation: 309
I am comparing two df, it gives me False
when using .equals()
, but if I append two df together and use drop_duplicate()
it gives me nothing. Can someone explain this?
Upvotes: 2
Views: 1592
Reputation: 294228
These are completely different operations and I'd have never expected them to produce the same results.
pandas.DataFrame.equals
Will return a boolean value depending on whether Pandas determines that the dataframes being compared are the "same". That means that the index of one is the "same" as the index of the other, the columns of one is the "same" as the columns of the the other, and the data of one is the "same" as the data of the other.
See docs
It is NOT the same as pandas.DataFrame.eq
which will return a dataframe of boolean values.
Consider these three dataframes
df0 = pd.DataFrame([[0, 1], [2, 3]], [0, 1], ['A', 'B'])
df1 = pd.DataFrame([[1, 0], [3, 2]], [0, 1], ['B', 'A'])
df2 = pd.DataFrame([[0, 1], [2, 3]], ['foo', 'bar'], ['A', 'B'])
df0 df1 df2
A B B A A B
0 0 1 0 1 0 foo 0 1
1 2 3 1 3 2 bar 2 3
If we checked if df1
was equals
to df0
, we get
df0.equals(df1)
False
Even though all elements are the same
df0.eq(df1).all().all()
True
And that is because the columns are not aligned. If I sort the columns then ...
df0.equals(df1.sort_index(axis=1))
True
pandas.DataFrame.drop_duplicates
Compares the values in rows and doesn't care about the index.
So, both of these produce the same looking results
df0.append(df2).drop_duplicates()
and
df0.append(df1, sort=True).drop_duplicates()
A B
0 0 1
1 2 3
When I append
(or pandas.concat
), Pandas will align the columns and add the appended dataframe as new rows. Then drop_duplicates
does it's thing. But it was the inherent aligning of the columns that does the what I did above with sort_index
and axis=1
.
Upvotes: 2
Reputation: 9
maybe the lines in both dataframes are not ordered the same way? dataframes will be equal when the lines corresponding to the same index are the same
Upvotes: 0