Reputation: 1
I have a TABLE
called Food with a few thousand records. Each record is a food consisting of a Description and 5 categories. Each succeeding category is more specific than the last. Description, cat1, and cat2 cannot accept NULL
values. Cat3, cat4, and cat5 can accept NULL
values.
For example:
description | cat1 | cat2 | cat3 | cat4 | cat5
Green apple | Food | produce | NULL | apple | NULL
Root beer | Bev | Non-Alc | NULL | Root beer | NULL
The issue I'm having is that there are lots of duplicate descriptions assigned different categories. And I need help writing a code that can delete these duplicates by certain conditions.
Example:
description | cat1 | cat2 | cat3 | cat4 | cat5
Mango Syrup | Food | Produce | NULL | Mango | NULL
Mango Syrup | Food | Dry Good | NULL | NULL | Syrup
Pepperoni Pizza| Food | Meat | Pepperoni| NULL | NULL
Pepperoni Pizza| Food | Bakery | NULL | Pizza | NULL
I have hundreds of these, and luckily, they're all set up the same way. Out of the three columns cat3, 4, and 5, only one can be populated. If a record's cat5 IS NOT NULL
, cat3 and cat4 will be NULL
. If a record's cat4 IS NOT NULL
, cat3 and cat5 will be NULL
. If a record's cat3 IS NOT NULL
, cat4 and cat5 will be NULL
.
The correct duplicate is the item whose record has the most specific category populated. All other duplicates should be deleted.
Things I've tried:
GROUP BY
and using MIN()
or MAX()
to aggregate different categories. This doesn't give me the result I want.
Anything will help. Appreciate it.
Upvotes: 0
Views: 910
Reputation: 1269633
So for each description, you want the row that has the "biggest" category filled.
One method uses row_number()
:
select t.*
from (select t.*,
row_number() over (partition by description
order by case when cat5 is null then 1
when cat4 is null then 2
when cat3 is null then 3
when cat2 is null then 4
when cat1 is null then 5
end
) as seqnum
from t
) t
where seqnum = 1;
If you actually want to delete the other rows, then use an updatable CTE:
with todelete as (
select t.*
from (select t.*,
row_number() over (partition by description
order by case when cat5 is null then 1
when cat4 is null then 2
when cat3 is null then 3
when cat2 is null then 4
when cat1 is null then 5
end
) as seqnum
from t
) t
)
delete from todelete
where seqnum > 1;
Upvotes: 1