Check if two column values found in other pandas dataframe

Question

Set-up

I have 2 pandas dfs (df1 and df2) which contain some overlapping rows and some non-overlapping rows.

Both dfs have the columns order_id and shop.

Now, if a df1 row matches with a df2 row on the combination of order_id and shop, then nothing. But if a df1 row doesn't match with a df2 row on the combination of order_id and shop, then this df1 row should be added to df2.

Example

df2 is such that,

    order_id    shop
0     12345     'NL'
1     45678     'FR'
2     12345     'DE'
3     34567     'NL'

Now if df1 such that,

    order_id    shop
0     12345     'NL'
1     45678     'FR'

then nothing.

But if df1 such that,

        order_id    shop
0       12345       'NL'
1       99999       'FR'
2       12345       'UK'

then row 1 and 2 should be added to df2 even though shop value for row 1 and order_id for row 2 are already in df2.

The resulting df2 should look like,

    order_id    shop
0     99999     'FR'
1     12345     'UK'        
2     12345     'NL'
3     45678     'FR'
4     12345     'DE'
5     34567     'NL'

Note that the order_id column is int and the shop column is string.

So in graphical terms what I want to achieve looks like:

Code

I created a monstrous line which then didn't really work...

So far, I have,

result_df = df1[(~df1['order_id'].astype(str).isin(df2['order_id'].astype(str)))]

How do I solve this?

EXTRA

if df1 such that,

        order_id    shop
0       12345       'NL'
1       99999       'FR'
2       12345       'UK'

how do I compare df1 with df2 such that I get df3 which looks like,

        order_id    shop
0       99999       'FR'
1       12345       'UK'

jezrael · Accepted Answer

If all rows are unique use concat with drop_duplicates:

df = pd.concat([df2, df1], ignore_index=True).drop_duplicates()
print (df)
   order_id  shop
0     12345  'NL'
1     45678  'FR'
2     12345  'DE'
3     34567  'NL'
5     99999  'FR'
6     12345  'UK'

If not unique filter not equal rows by merge with indicator=True and then concat to df2:

df3 = df1.merge(df2, how='left', indicator=True).query('_merge == "left_only"')[df1.columns]
df = pd.concat([df2, df3], ignore_index=True)

Check if two column values found in other pandas dataframe

Answers (2)

Related Questions