irene
irene

Reputation: 2253

How to do left outer join exclusion in pandas

I have two dataframes, A and B, and I want to get those in A but not in B, just like the one right below the top left corner.

The one below the top left

Dataframe A has columns ['a','b' + others] and B has columns ['a','b' + others]. There are no NaN values. I tried the following:

1.

dfm = dfA.merge(dfB, on=['a','b'])
dfe = dfA[(~dfA['a'].isin(dfm['a']) | (~dfA['b'].isin(dfm['b'])

2.

dfm = dfA.merge(dfB, on=['a','b'])
dfe = dfA[(~dfA['a'].isin(dfm['a']) & (~dfA['b'].isin(dfm['b'])

3.

dfe = dfA[(~dfA['a'].isin(dfB['a']) | (~dfA['b'].isin(dfB['b'])

4.

dfe = dfA[(~dfA['a'].isin(dfB['a']) & (~dfA['b'].isin(dfB['b'])

but when I get len(dfm) and len(dfe), they don't sum up to dfA (it's off by a few numbers). I've tried doing this on dummy cases and #1 works, so maybe my dataset may have some peculiarities I am unable to reproduce.

What's the right way to do this?

Upvotes: 54

Views: 71252

Answers (2)

Justin
Justin

Reputation: 211

I think it would go something like the examples in: Pandas left outer join multiple dataframes on multiple columns

dfe = pd.merge(dFA, dFB, how='left', on=['a','b'], indicator=True)
dfe[dfe['_merge'] == 'left_only']

Upvotes: 7

phi
phi

Reputation: 11714

Check out this link

df = pd.merge(dfA, dfB, on=['a','b'], how="outer", indicator=True)
df = df[df['_merge'] == 'left_only']

One liner :

df = pd.merge(dfA, dfB, on=['a','b'], how="outer", indicator=True
              ).query('_merge=="left_only"')

Upvotes: 81

Related Questions