Bumbaro2217
Bumbaro2217

Reputation: 35

Delete rows in dataframe based on 2 column conditional match in separate dataframe

I have two data frames DF1 and DF2. I need to remove all rows in DF2 that match both the EEnum and Date from DF1. There are multiple date columns in DF2 but I need Date in DF1 to validate to Date5 in DF2.

DF1:

EEnum Date
123 01/01/2022
123 06/01/2022
123 06/15/2022
124 01/01/2022
124 01/02/2022

DF2:

EEnum Date5
123 01/01/2022
123 05/01/2022
123 06/01/2022
123 06/25/2022
124 01/01/2022
124 01/03/2022

Result DF2:

EEnum Date5
123 05/01/2022
123 06/25/2022
124 01/03/2022

I tried the following but it is removing all rows in DF2 where the Date in DF1 exists regardless of EEnum.

DF2 = DF2.loc[~((DF2.EEnum.isin(DF1['EEnum']))&(DF2.Date5.isin(DF1['Date']))),:]

Any ideas as to where I am going wrong? Or a better way to go about this? Pretty new so any help is appreciated.

Upvotes: 0

Views: 37

Answers (1)

mozway
mozway

Reputation: 260640

Use a merge with indicator=True, then filter:

(df2.merge(df1.rename(columns={'Date': 'Date5'}),
           indicator=True, how='outer')
    .query('_merge == "left_only"')
    .drop(columns='_merge')
)

output:

   EEnum       Date5
1    123  05/01/2022
3    123  06/25/2022
5    124  01/03/2022

intermediate (merge only):

   EEnum       Date5      _merge
0    123  01/01/2022        both
1    123  05/01/2022   left_only
2    123  06/01/2022        both
3    123  06/25/2022   left_only
4    124  01/01/2022        both
5    124  01/03/2022   left_only
6    123  06/15/2022  right_only
7    124  01/02/2022  right_only

Upvotes: 1

Related Questions