Reputation: 2253
I have two dataframes, A and B, and I want to get those in A but not in B, just like the one right below the top left corner.
Dataframe A has columns ['a','b' + others]
and B has columns ['a','b' + others]
. There are no NaN values. I tried the following:
1.
dfm = dfA.merge(dfB, on=['a','b'])
dfe = dfA[(~dfA['a'].isin(dfm['a']) | (~dfA['b'].isin(dfm['b'])
2.
dfm = dfA.merge(dfB, on=['a','b'])
dfe = dfA[(~dfA['a'].isin(dfm['a']) & (~dfA['b'].isin(dfm['b'])
3.
dfe = dfA[(~dfA['a'].isin(dfB['a']) | (~dfA['b'].isin(dfB['b'])
4.
dfe = dfA[(~dfA['a'].isin(dfB['a']) & (~dfA['b'].isin(dfB['b'])
but when I get len(dfm)
and len(dfe)
, they don't sum up to dfA
(it's off by a few numbers). I've tried doing this on dummy cases and #1 works, so maybe my dataset may have some peculiarities I am unable to reproduce.
What's the right way to do this?
Upvotes: 54
Views: 71252
Reputation: 211
I think it would go something like the examples in: Pandas left outer join multiple dataframes on multiple columns
dfe = pd.merge(dFA, dFB, how='left', on=['a','b'], indicator=True)
dfe[dfe['_merge'] == 'left_only']
Upvotes: 7