Younghak Jang
Younghak Jang

Reputation: 489

Check if two rows in pandas DataFrame has same set of values regard & regardless of column order

I have two dataframe with same index but different column names. Number of columns are the same. I want to check, index by index, 1) whether they have same set of values regardless of column order, and 2) whether they have same set of values regarding column order.

ind = ['aaa', 'bbb', 'ccc']
df1 = pd.DataFrame({'old1': ['A','A','A'], 'old2': ['B','B','B'], 'old3': ['C','C','C']}, index=ind)
df2 = pd.DataFrame({'new1': ['A','A','A'], 'new2': ['B','C','B'], 'new3': ['C','B','D']}, index=ind)

This is the output I need.

     OpX   OpY
-------------
aaa  True  True
bbb  False True
ccc  False False

Could anyone help me with OpX and OpY?

Upvotes: 3

Views: 22521

Answers (4)

BENY
BENY

Reputation: 323226

Using tuple and set: keep the order or tuple , and reorder with set

s1=df1.apply(tuple,1)==df2.apply(tuple,1)
s2=df1.apply(set,1)==df2.apply(set,1)
pd.concat([s1,s2],1)
Out[746]: 
         0      1
aaa   True   True
bbb  False   True
ccc  False  False

Since cs95 mentioned apply have problem here

s=np.equal(df1.values,df2.values).all(1)
t=np.equal(np.sort(df1.values,1),np.sort(df2.values,1)).all(1)
pd.DataFrame(np.column_stack([s,t]),index=df1.index)
Out[754]: 
         0      1
aaa   True   True
bbb  False   True
ccc  False  False

Upvotes: 3

cs95
cs95

Reputation: 402333

Here's a solution that is performant and should scale. First, align the DataFrames on the index so you can compare them easily.

df3 = df2.set_axis(df1.columns, axis=1, inplace=False)
df4, df5 = df1.align(df3)

For req 1, simply call DataFrame.equals (or just use the == op):

u = (df4 == df5).all(axis=1)
u

aaa     True
bbb    False
ccc    False
dtype: bool

Req 2 is slightly more complex, sort them along the first axis, then compare.

v = pd.Series((np.sort(df4) == np.sort(df5)).all(axis=1), index=u.index)
v

aaa     True
bbb     True
ccc    False
dtype: bool

Concatenate the results,

pd.concat([u, v], axis=1, keys=['X', 'Y'])

         X      Y
aaa   True   True
bbb  False   True
ccc  False  False

Upvotes: 2

U13-Forward
U13-Forward

Reputation: 71570

Construct a new DataFrame and check the equality:

df3 = pd.DataFrame(index=ind)
df3['OpX'] = (df1.values == df2.values).all(1)
df3['OpY'] = (df1.apply(np.sort, axis=1).values == df2.apply(np.sort, axis=1).values).all(1)
print(df3)

Output:

       OpX    OpY
aaa   True   True
bbb  False   True
ccc  False  False

Upvotes: 0

Ken Wei
Ken Wei

Reputation: 3130

For item 2):

(df1.values == df2.values).all(axis=1)

This checks element-wise equality of the dataframes, and gives True when all entries in a row are equal.

For item 1), sort the values along each row first:

import numpy as np
(np.sort(df1.values, axis=1) == np.sort(df2.values, axis=1)).all(axis=1)

Upvotes: 1

Related Questions