Reputation: 113
I have recently started to learn database queries in Oracle, but there is one thing I am having a lot of trouble understanding about grouping.
It is best explained with example. Say I have source data that looks like this:
MY_SOURCE
ID | Fruit
----------
1 | Orange
1 | Apple
1 | Orange
2 | Banana
2 | Apple
3 | Apple
3 | Apple
3 | Orange
Suppose certain values of fruit are have different rank/priority. Say the rank (1=most important; 3=least important) is:
Fruit Rank:
1. Banana
2. Orange
3. Apple
I would like to group data by ID. When I do so, there will be duplicates for the fruit column for each ID in the data source which have to be aggregated. For instance, for ID one, the set of possible values is:
Orange, Apple, Orange
At this point, instead of aggregating the duplicate fruit sets with something like stats_mode(), I would like to group the data by fruit rank and only display the value with the highest rank. So, the output would be:
ID | Fruit
----------
1 | Orange
2 | Banana
3 | Orange
Is there a way to accomplish this in SQL?
I am imagining a query that looks something like:
SELECT DISTINCT
ID,
MAGIC_MAX_RANK_FUNCTION(FRUIT, ['Banana','Orange', 'Apple'])
FROM
MY_SOURCE
In technical terms, we are grouping the duplicate fruit data according to a rank specified by the programmer for all of that column's possible values.
Thank you in advance!
Upvotes: 2
Views: 658
Reputation: 1269513
Instead of GROUP BY
, I would just use ROW_NUMBER()
:
select s.*
from (select s.*,
row_number() over (partition by id
order by (case fruit when 'banana' then 1 when 'orange' then 2 when 'apple' then 3 else 999 end)
) as seqnum
from my_source s
) s
where seqnum = 1;
Another method uses union all
and is best for short lists:
select s.*
from my_source s
where s.fruit = 'banana'
union all
select s.*
from my_source s
where s.fruit = 'orange' and
not exists (select 1 from my_source s2 where s2.id = s.id and s2.fruit in ('banana'))
union all
select s.*
from my_source s
where s.fruit = 'apple' and
not exists (select 1 from my_source s2 where s2.id = s.id and s2.fruit in ('banana', 'orange'));
Under some circumstances, this method might be faster than the previous method (and vice versa).
Upvotes: 1