SQL. Delete duplicate records based on multiple criteria

Question

I have a TABLE called Food with a few thousand records. Each record is a food consisting of a Description and 5 categories. Each succeeding category is more specific than the last. Description, cat1, and cat2 cannot accept NULL values. Cat3, cat4, and cat5 can accept NULL values.

For example:

description | cat1 | cat2    | cat3  | cat4      | cat5
Green apple | Food | produce | NULL  | apple     | NULL
Root beer   | Bev  | Non-Alc | NULL  | Root beer | NULL

The issue I'm having is that there are lots of duplicate descriptions assigned different categories. And I need help writing a code that can delete these duplicates by certain conditions.

Example:

description    | cat1 | cat2     | cat3     | cat4  | cat5
Mango Syrup    | Food | Produce  | NULL     | Mango | NULL
Mango Syrup    | Food | Dry Good | NULL     | NULL  | Syrup
Pepperoni Pizza| Food | Meat     | Pepperoni| NULL  | NULL
Pepperoni Pizza| Food | Bakery   | NULL     | Pizza | NULL

I have hundreds of these, and luckily, they're all set up the same way. Out of the three columns cat3, 4, and 5, only one can be populated. If a record's cat5 IS NOT NULL, cat3 and cat4 will be NULL. If a record's cat4 IS NOT NULL, cat3 and cat5 will be NULL. If a record's cat3 IS NOT NULL, cat4 and cat5 will be NULL.

The correct duplicate is the item whose record has the most specific category populated. All other duplicates should be deleted.

Things I've tried: GROUP BY and using MIN() or MAX() to aggregate different categories. This doesn't give me the result I want.

Anything will help. Appreciate it.

SQL. Delete duplicate records based on multiple criteria

Answers (1)

Related Questions