Vishav Gupta
Vishav Gupta

Reputation: 397

Compare data of two columns of one dataframe with two columns of another dataframe and find mismatch data

I have dataframe df1 as following-

first dataframe

Second dataframe df2 is as following-

second dataframe

and I want the resulted dataframe as following

final dataframe

Dataframe df1 & df2 contains a large number of columns and data but here I am showing sample data. My goal is to compare Customer and ID column of df1 with Customer and Part Number of df2. Comparison is to find mismatch of data of df1['Customer'] and df1['ID'] with df2['Customer'] and df2['Part Number']. Finally storing mismatch data in another dataframe df3. For example: Customer(rishab) with ID(89ab) is present in df1 but not in df2.Thus Customer, Order#, and Part are stored in df3.

I am using isin() method to find mismatch of df1 with df2 for one column only but not able to do it for comparison of two columns.

df3 = df1[~df1['ID'].isin(df2['Part Number'].values)]


#here I am only able to find mismatch based upon only 1  column ID but I want to include Customer also

I can use loop also but the data is very large(Time complexity will increase) and I am sure there can be one-liner code to achieve this task. I have also tried to use merge but not able to produce the exact output.

So, how to produce this exact output? I am also not able to use isin() for two columns and I think isin() cannot to use for two columns

Upvotes: 0

Views: 1701

Answers (2)

srishtigarg
srishtigarg

Reputation: 1204

The easiest way to achieve this is:

df3 = df1.merge(df2, left_on = ['Customer', 'ID'],right_on= ['Customer', 'Part Number'],  how='left', indicator=True)
df3.reset_index(inplace = True)
df3 = df3[df3['_merge'] == 'left_only']

Here, you first do a left join on the columns, and put indicator = True, which will give another column like _merge, which has indicator mentioning which side the data exists, and then we pick left_only from those.

Upvotes: 1

XXavier
XXavier

Reputation: 1226

You can try outer join to get non matching rows. Something like df3 = df1.merge(df2, left_on = ['Customer', 'ID'],right_on= ['Customer', 'Part Number'], how = "outer")

Upvotes: 0

Related Questions