How to subtract a dataframe from a dataframe based on columns?

Question

I have below dataframes

df1 = pd.DataFrame({
    'contact_id': [1,3,4,5,-1],
    'subscription_id': ['AAA', 'ccc', 'ddd', 'eee', 'fff']
});

print(df1)

   contact_id subscription_id
0           1             AAA
1           3             ccc
2           4             ddd
3           5             eee
4          -1             fff

2nd dataframe

df2 = pd.DataFrame({
    'contact_id': [1,2,-1],
    'subscription_id': ['AAA', 'bbb', 'fff'],
    'extra': ['we', 'kl', 'op']
});

print(df2)

   contact_id subscription_id extra
0           1             AAA    we
1           2             bbb    kl
2          -1             fff    op

Expected Output

   contact_id subscription_id extra
1           3             ccc   NaN
2           4             ddd   NaN
3           5             eee   NaN

My Code

import pandas as pd

df1 = pd.DataFrame({
    'contact_id': [1,3,4,5,-1],
    'subscription_id': ['AAA', 'ccc', 'ddd', 'eee', 'fff']
});

print(df1)

df2 = pd.DataFrame({
    'contact_id': [1,2,-1],
    'subscription_id': ['AAA', 'bbb', 'fff'],
    'extra': ['we', 'kl', 'op']
});

print(df2)

sub = pd.concat([df1, df2, df2]).drop_duplicates(keep=False)
print(sub)

Can anyone guide me where i am doing wrong?

Mayank Porwal · Accepted Answer

What you want is basically result of Left join minus result of Inner Join. This looks like a typical case of merge not pd.concat.

Use df.merge with Left join and indicator column as True. Pick rows which are present in df1 only by choosing left_only:

In [1586]: df1.merge(df2, how='left', indicator=True).query('_merge == "left_only"').drop('_merge', 1)
Out[1586]: 
   contact_id subscription_id extra
1           3             ccc   NaN
2           4             ddd   NaN
3           5             eee   NaN

How to subtract a dataframe from a dataframe based on columns?

Answers (2)

Related Questions