Reputation: 9102
I couldn't find a way to have a dataframe that has the difference of 2 dataframes based on a column. So basically:
dfA = ID, val
1, test
2, other test
dfB = ID, val
2, other test
I want to have a dfC
that holds the difference dfA - dfB
based on column ID
dfC = ID, val
1, test
Upvotes: 1
Views: 3847
Reputation: 28243
merge the dataframe on ID
dfMerged = dfA.merge(dfB, left_on='ID', right_on='ID', how='outer') # defaults to inner join.
In the merged dataframe, name collisions are avoided using the suffix _x
& _y
to denote left and right source dataframes.
So, you'll end up with (most likely) val_x
and val_y
. compare these columns however you want to. For example:
dfMerged['x_y_test'] = dfMerged.val_y == dfMerged.val_x
# gives you a column with a comparison of val_x, val_y.
Use this as a mask to get to the desired dfC
in your question.
Upvotes: 5
Reputation: 719
Does this work for you?
dfC = dfB[dfB["ID"] == dfA["ID"]]
How about this:
dfC = dfB[dfB["ID"].isin(dfA["ID"])]
Upvotes: 1