Reputation: 13715
I would like to compare two pd.dataframes
for equality:
foo = pd.DataFrame([['between', 1.5], ['between', 2],
['between', 2.0], ['within', 2.0]],
columns=['Group', 'Distance'])
bar = pd.DataFrame([['between', 2], ['between', 1.5],
['within', 2.0], ['between', 2.0]],
columns=['Group', 'Distance'])
As far as I am concerned these two dataframes are identical, however I realize pandas does not agree because they are not in the same order. My thought was that I could sort and then reindex
foo = foo.sort_values('Distance').reset_index(drop=True)
bar = bar.sort_values('Distance').reset_index(drop=True)
Pandas sort gives different results because of the initial ordering of the dataframes. And in fact they don't evaluate as being equivalent:
foo.equals(bar)
False
I could first sort on Group
and then on Distance
and this would return True
, however in dealing with larger dataframes I'm concerned about having to explicitly define sorting rules each time. Is there a better way of comparing two differently ordered dataframes?
Upvotes: 1
Views: 100
Reputation: 27879
This way you can make them evaluate to True
:
foo.sort_values(foo.columns.values.tolist()).reset_index(drop=True).equals(bar.sort_values(foo.columns.values.tolist()).reset_index(drop=True))
Or
foo = foo.sort_values(foo.columns.values.tolist()).reset_index(drop=True)
bar = bar.sort_values(foo.columns.values.tolist()).reset_index(drop=True)
foo.equals(bar)
True
Upvotes: 2