PythonBestie007
PythonBestie007

Reputation: 39

Compare two Dataframes but just on specific columns

I have two Dataframes (df1 and df2)

df1: 
 A  B C  D
12 52 16 23 
19 32 30 09

df2:  
A  G  C  D  E
12 13 16 04 100

I want to create a new column in df1 called 'Compare' Then I want to compare the columns 'A' and 'C' and if the are same then give 'Compare' in this row the value 'X'.

result = df1[df1["A"].isin(df2["A"].tolist())] does not work.

Upvotes: 1

Views: 54

Answers (1)

jezrael
jezrael

Reputation: 862651

You can chain 2 conditions with & for bitwise AND or | for bitwise OR and set new values by numpy.where:

mask = df1["A"].isin(df2["A"]) & df1["C"].isin(df2["C"])
df1['Compare'] = np.where(mask, 'X', '')
print (df1)
    A   B   C   D Compare
0  12  52  16  23       X
1  19  32  30   9        

Or use DataFrame.merge with left join and indicator=True:

s = df1[['A','C']].merge(df2[['A','C']], how='left', indicator=True)['_merge']
df1['Compare'] = np.where(s == 'both', 'X', '')
print (df1)
    A   B   C   D Compare
0  12  52  16  23       X
1  19  32  30   9        

Upvotes: 1

Related Questions