Reputation: 1971
Set-up
I have 2 pandas dfs (df1 and df2) which contain some overlapping rows and some non-overlapping rows.
Both dfs have the columns order_id
and shop
.
Now, if a row in df1 matches any row in df2 on the combination of order_id
and shop
, then this row should be dropped from df1. If this row doesn't match any row in df2 on order_id
and shop
, it should be kept.
Example
df2
is such that,
order_id shop
0 12345 'NL'
1 45678 'FR'
2 12345 'DE'
3 34567 'NL'
Now if df1
such that,
order_id shop
0 12345 'NL'
1 45678 'FR'
then df1 should return empty.
But if df1
such that,
order_id shop
0 12345 'NL'
1 99999 'FR'
2 12345 'UK'
then df1
should return,
order_id shop
0 99999 'FR'
1 12345 'UK'
Code
I created a monstrous line which then didn't really work...
So far, I have,
result_df = df1[(~df1['order_id'].astype(str).isin(df2['order_id'].astype(str)))]
How do I solve this?
Upvotes: 2
Views: 731
Reputation: 862641
I think there are not same types of columns, so first convert it to string
and then merge
with indicator=True
:
df3 = (df1.astype(str).merge(df2.astype(str), how='left', indicator=True)
.query('_merge == "left_only"')[df1.columns])
print (df3)
order_id shop
2 99999 'FR'
3 12345 'UK'
Also is possible check if same dtypes before solution:
print (df1.dtypes)
print (df2.dtypes))
And convert only column(s) which is different dtype
:
df2['order_id'] = df2['order_id'].astype(str)
Upvotes: 2