Reputation: 397
I have dataframe df1
as following-
Second dataframe df2
is as following-
and I want the resulted dataframe as following
Dataframe df1 & df2 contains a large number of columns and data but here I am showing sample data. My goal is to compare Customer and ID column of df1 with Customer and Part Number of df2. Comparison is to find mismatch of data of df1['Customer'] and df1['ID'] with df2['Customer'] and df2['Part Number']. Finally storing mismatch data in another dataframe df3
. For example: Customer(rishab) with ID(89ab) is present in df1 but not in df2
.Thus Customer, Order#, and Part are stored in df3
.
I am using isin()
method to find mismatch of df1
with df2
for one column only but not able to do it for comparison of two columns.
df3 = df1[~df1['ID'].isin(df2['Part Number'].values)]
#here I am only able to find mismatch based upon only 1 column ID but I want to include Customer also
I can use loop also but the data is very large(Time complexity will increase) and I am sure there can be one-liner code to achieve this task. I have also tried to use merge but not able to produce the exact output.
So, how to produce this exact output? I am also not able to use isin()
for two columns and I think isin()
cannot to use for two columns
Upvotes: 0
Views: 1701
Reputation: 1204
The easiest way to achieve this is:
df3 = df1.merge(df2, left_on = ['Customer', 'ID'],right_on= ['Customer', 'Part Number'], how='left', indicator=True)
df3.reset_index(inplace = True)
df3 = df3[df3['_merge'] == 'left_only']
Here, you first do a left join
on the columns, and put indicator = True
, which will give another column like _merge, which has indicator mentioning which side the data exists, and then we pick left_only
from those.
Upvotes: 1
Reputation: 1226
You can try outer join to get non matching rows. Something like df3 = df1.merge(df2, left_on = ['Customer', 'ID'],right_on= ['Customer', 'Part Number'], how = "outer")
Upvotes: 0