Reputation: 21
Ex -: I have 3 data-frames like -: titanic & titanic_new & titanic_copy (which have identical data)
I have used following code to compare 3 data-frames & I got expected result -:
(titanic.equals(titanic_copy)) and (titanic.equals(titanic_new)) and (titanic_copy.equals(titanic_new))
Output -: True
Is there any optimal way to compare 3 data-frames (or) any pre-defined method to compare 3 or more data-frames ?
TIA
Upvotes: 2
Views: 255
Reputation: 402473
This expression returns true if all your DataFrames are equal:
all(x.equals(y) for x, y in zip(df_list[:-1], df_list[1:]))
To understand why this works, consider
df_list = [dfA, dfB, dfC]
Our expression computes the following:
dfA == dfB
dfB == dfC
If both these conditions are True, we know all frames are equal (because of transitivity - if A == B and B == C then A == C, and so on).
Minimal Example
df = pd.DataFrame({'A': [1, 2, 3]}, index=['a', 'b', 'c'])
df2 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
dfl1 = [df, df, df, df, df]
dfl2 = [df2, df, df2]
all(x.equals(y) for x, y in zip(dfl1[1:], dfl1[:-1]))
# True
all(x.equals(y) for x, y in zip(dfl2[1:], dfl2[:-1]))
# False
Upvotes: 2