PepperoniPizza
PepperoniPizza

Reputation: 9102

Pandas difference between dataframes on column values

I couldn't find a way to have a dataframe that has the difference of 2 dataframes based on a column. So basically:

dfA = ID, val
      1, test
      2, other test

dfB = ID, val
      2, other test

I want to have a dfC that holds the difference dfA - dfB based on column ID

dfC = ID, val
      1, test

Upvotes: 1

Views: 3847

Answers (2)

Haleemur Ali
Haleemur Ali

Reputation: 28243

merge the dataframe on ID

dfMerged = dfA.merge(dfB, left_on='ID', right_on='ID', how='outer') # defaults to inner join.

In the merged dataframe, name collisions are avoided using the suffix _x & _y to denote left and right source dataframes.

So, you'll end up with (most likely) val_x and val_y. compare these columns however you want to. For example:

dfMerged['x_y_test'] = dfMerged.val_y == dfMerged.val_x
# gives you a column with a comparison of val_x, val_y.

Use this as a mask to get to the desired dfC in your question.

Upvotes: 5

jgloves
jgloves

Reputation: 719

Does this work for you?

dfC = dfB[dfB["ID"] == dfA["ID"]]

How about this:

dfC = dfB[dfB["ID"].isin(dfA["ID"])]

Upvotes: 1

Related Questions