Reputation: 489
I have two dataframe with same index but different column names. Number of columns are the same. I want to check, index by index, 1) whether they have same set of values regardless of column order, and 2) whether they have same set of values regarding column order.
ind = ['aaa', 'bbb', 'ccc']
df1 = pd.DataFrame({'old1': ['A','A','A'], 'old2': ['B','B','B'], 'old3': ['C','C','C']}, index=ind)
df2 = pd.DataFrame({'new1': ['A','A','A'], 'new2': ['B','C','B'], 'new3': ['C','B','D']}, index=ind)
This is the output I need.
OpX OpY
-------------
aaa True True
bbb False True
ccc False False
Could anyone help me with OpX and OpY?
Upvotes: 3
Views: 22521
Reputation: 323226
Using tuple
and set
: keep the order or tuple , and reorder with set
s1=df1.apply(tuple,1)==df2.apply(tuple,1)
s2=df1.apply(set,1)==df2.apply(set,1)
pd.concat([s1,s2],1)
Out[746]:
0 1
aaa True True
bbb False True
ccc False False
Since cs95 mentioned apply have problem here
s=np.equal(df1.values,df2.values).all(1)
t=np.equal(np.sort(df1.values,1),np.sort(df2.values,1)).all(1)
pd.DataFrame(np.column_stack([s,t]),index=df1.index)
Out[754]:
0 1
aaa True True
bbb False True
ccc False False
Upvotes: 3
Reputation: 402333
Here's a solution that is performant and should scale. First, align the DataFrames on the index so you can compare them easily.
df3 = df2.set_axis(df1.columns, axis=1, inplace=False)
df4, df5 = df1.align(df3)
For req 1, simply call DataFrame.equals
(or just use the ==
op):
u = (df4 == df5).all(axis=1)
u
aaa True
bbb False
ccc False
dtype: bool
Req 2 is slightly more complex, sort them along the first axis, then compare.
v = pd.Series((np.sort(df4) == np.sort(df5)).all(axis=1), index=u.index)
v
aaa True
bbb True
ccc False
dtype: bool
Concatenate the results,
pd.concat([u, v], axis=1, keys=['X', 'Y'])
X Y
aaa True True
bbb False True
ccc False False
Upvotes: 2
Reputation: 71570
Construct a new DataFrame
and check the equality:
df3 = pd.DataFrame(index=ind)
df3['OpX'] = (df1.values == df2.values).all(1)
df3['OpY'] = (df1.apply(np.sort, axis=1).values == df2.apply(np.sort, axis=1).values).all(1)
print(df3)
Output:
OpX OpY
aaa True True
bbb False True
ccc False False
Upvotes: 0
Reputation: 3130
For item 2):
(df1.values == df2.values).all(axis=1)
This checks element-wise equality of the dataframes, and gives True
when all entries in a row are equal.
For item 1), sort the values along each row first:
import numpy as np
(np.sort(df1.values, axis=1) == np.sort(df2.values, axis=1)).all(axis=1)
Upvotes: 1