Reputation: 43
I have two dataframes that have duplicates but I need to remove only the rows that have duplicate VIN numbers and doesn't look at the other cells.
0 230 5UXCR6C50KTQ4xxxx KLL34607 2019 BMW M3
1 116 5UXCR4C00LLW6xxxx LLW63494 2020 BMW X5 Not Found
2 109 5UXCR6C06LLL7xxxx LLL76916 2020 BMW X5 Need Detail
38 229 5UXCR6C50KLL3xxxx MLL23650 2019 BMW X5
43 115 5UXCR4C00LLW6xxxx LLW63494 2020 BMW X5
37 108 5UXCR6C06LLL7xxxx LLL76916 2020 BMW X5
The last 2 rows look like different rows to pandas but I need to merge the two data frames and remove the rows just based on those VIN numbers and ignores the 'Not Found' and 'Need Detail'
I've tried .drop_duplicates
.cumsum()
and a few other methods but nothing seems to work.
Upvotes: 0
Views: 54
Reputation: 2564
I think what you're trying to say is that you need to concatenate the two dataframes and then remove all duplicated rows based on only a subset of columns.
You can use pd.concat([df1, df2]).drop_duplicates(subset=['VIN'])
where subset is a list of column names that are used to drop the duplicated rows. (See the documentation for extra details)
Upvotes: 1