Reputation: 31
I have two tables in Snowflake - one with color data and one with ranks for each color :
Type1 | Type2 | Type3 | Type4 | Type5 |
---|---|---|---|---|
red | white | yellow | red | white |
yellow | white | yellow | white | yellow |
color | rank |
---|---|
red | 1 |
white | 2 |
yellow | 3 |
How can I pull in top 3 colors with highest ranks in the same row? (rank 1 is the highest and if there are ties, you can pull in any of the ties). Actual data is much larger so hoping to find a solution that avoids manual case statements.
Desired result is the following:
Highest | Second | Third |
---|---|---|
red | red | white |
white | white | yellow |
Upvotes: 0
Views: 103
Reputation: 1075
Not an expert in the snowflake but Oracle. You can take the idea from this Oracle query:
SELECT ID,
MAX(CASE WHEN NEW_RANK = 1 THEN COLOR END) AS HIGHEST,
MAX(CASE WHEN NEW_RANK = 2 THEN COLOR END) AS SECOND,
MAX(CASE WHEN NEW_RANK = 3 THEN COLOR END) AS THIRD
FROM
(SELECT ID,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY RNK) AS NEW_RANK,
COLOR
FROM
(SELECT T.ID,
CASE WHEN L.LVL =1 THEN R1.RANK
WHEN L.LVL =2 THEN R2.RANK
WHEN L.LVL =3 THEN R3.RANK
WHEN L.LVL =4 THEN R4.RANK
WHEN L.LVL =5 THEN R5.RANK
END AS RNK,
CASE WHEN L.LVL =1 THEN R1.COLOR
WHEN L.LVL =2 THEN R2.COLOR
WHEN L.LVL =3 THEN R3.COLOR
WHEN L.LVL =4 THEN R4.COLOR
WHEN L.LVL =5 THEN R5.COLOR
END AS COLOR
FROM TYPES T
JOIN RANKS R1 ON R1.COLOR = T.TYPE1
JOIN RANKS R2 ON R2.COLOR = T.TYPE2
JOIN RANKS R3 ON R3.COLOR = T.TYPE3
JOIN RANKS R4 ON R4.COLOR = T.TYPE4
JOIN RANKS R5 ON R5.COLOR = T.TYPE5
CROSS JOIN (SELECT LEVEL LVL FROM DUAL CONNECT BY LEVEL <= 5) L
)
)
GROUP BY ID
ORDER BY ID
I am getting the correct result with sample data.
Upvotes: 0
Reputation: 26068
if we turn you data into tables/CTE's:
with types(id, c1, c2) as (
select * from values
(1, 'r', 'y' ),
(2, 'w', 'w' ),
(3, 'y', 'y' ),
(4, 'r', 'w' ),
(5, 'w', 'y' )
), ranks (c, r) as (
select * from values
('r', 1),
('w', 2),
('y', 3)
)
then a double join with can be used to check the rank value and thus rank order:
select
t.c1
,t.c2
,r1.r + r2.r as rank_val
,row_number() over (order by (r1.r + r2.r) ) as rank_order
from types as t
join ranks as r1
on t.c1 = r1.c
join ranks as r2
on t.c2 = r2.c
which means we can then use a qualify to filter:
select
t.c1
,t.c2
from types as t
join ranks as r1
on t.c1 = r1.c
join ranks as r2
on t.c2 = r2.c
qualify row_number() over (order by (r1.r + r2.r) ) <= 3;
Or we can just use a DECODE (which is compact version of a CASE, which you noted you did not want to use, your choice, just more options)
select
t.c1
,t.c2
,decode(t.c1, 'r', 1, 'w', 2, 'y', 3) + decode(t.c2, 'r', 1, 'w', 2, 'y', 3) as rank_val
from types as t
order by rank_val
thus qualify on that instead:
with types(id, c1, c2) as (
select * from values
(1, 'r', 'y' ),
(2, 'w', 'w' ),
(3, 'y', 'y' ),
(4, 'r', 'w' ),
(5, 'w', 'y' )
)
select
t.c1
,t.c2
from types as t
qualify row_number() over (order by decode(t.c1, 'r', 1, 'w', 2, 'y', 3) + decode(t.c2, 'r', 1, 'w', 2, 'y', 3) ) <= 3
;
noting this is an unstable sort, and for data sets, you will get random behavior on the margin out of this, thus some form of secondary+ sorts should be added, to make the data stable.
Upvotes: 1