Reputation: 35
I have two data frames DF1 and DF2. I need to remove all rows in DF2 that match both the EEnum and Date from DF1. There are multiple date columns in DF2 but I need Date in DF1 to validate to Date5 in DF2.
DF1:
EEnum | Date |
---|---|
123 | 01/01/2022 |
123 | 06/01/2022 |
123 | 06/15/2022 |
124 | 01/01/2022 |
124 | 01/02/2022 |
DF2:
EEnum | Date5 |
---|---|
123 | 01/01/2022 |
123 | 05/01/2022 |
123 | 06/01/2022 |
123 | 06/25/2022 |
124 | 01/01/2022 |
124 | 01/03/2022 |
Result DF2:
EEnum | Date5 |
---|---|
123 | 05/01/2022 |
123 | 06/25/2022 |
124 | 01/03/2022 |
I tried the following but it is removing all rows in DF2 where the Date in DF1 exists regardless of EEnum.
DF2 = DF2.loc[~((DF2.EEnum.isin(DF1['EEnum']))&(DF2.Date5.isin(DF1['Date']))),:]
Any ideas as to where I am going wrong? Or a better way to go about this? Pretty new so any help is appreciated.
Upvotes: 0
Views: 37
Reputation: 260640
Use a merge
with indicator=True
, then filter:
(df2.merge(df1.rename(columns={'Date': 'Date5'}),
indicator=True, how='outer')
.query('_merge == "left_only"')
.drop(columns='_merge')
)
output:
EEnum Date5
1 123 05/01/2022
3 123 06/25/2022
5 124 01/03/2022
intermediate (merge
only):
EEnum Date5 _merge
0 123 01/01/2022 both
1 123 05/01/2022 left_only
2 123 06/01/2022 both
3 123 06/25/2022 left_only
4 124 01/01/2022 both
5 124 01/03/2022 left_only
6 123 06/15/2022 right_only
7 124 01/02/2022 right_only
Upvotes: 1