Reputation: 31
I have two tables in Snowflake - one with color data and one with ranks for each color:
Color table:
Index | color1 | color2 | color3 | color4 | color5 |
---|---|---|---|---|---|
1 | red | orange | red | blue | violet |
2 | indigo | yellow | blue | yellow | violet |
Rank table:
Color | Rank |
---|---|
Red | 1 |
Orange | 2 |
Yellow | 3 |
Green | 4 |
Blue | 5 |
Indigo | 6 |
Orange | 7 |
Violet | 8 |
I'd like to use the two tables above to rank each colors in each row and show top 3 colors in the resulting table. So resulting table will have red, red, and orange for index 1 and yellow, yellow, and blue for index 2.
Index | Rank1 | Rank2 | Rank3 |
---|---|---|---|
1 | red | red | orange |
2 | yellow | yellow | blue |
Any help would be appreciated. Thanks!
Upvotes: 1
Views: 149
Reputation: 26078
So using the CTE for the DATA, and encoding the ranking via DECODE (this step could be a TABLE/CTE and a join could be used). We UNPIVOT the five color columns and then assign the color rank, and then rank the ranks with ROW_NUMBER, we then filter out the position above 3 and then use PIVOT to roll these back into wide columns, and use MAX to keep the non-null values.
with data(idx, c1, c2, c3, c4, c5) as (
select *
from values
(1, 'red', 'orange', 'red', 'blue', 'violet'),
(2, 'indigo', 'yellow', 'blue', 'yellow', 'violet')
), unpivoted as (
select
*
,decode(color, 'red', 1,
'orange', 2,
'yellow', 3,
'green',4,
'blue', 5,
'indigo', 6,
'orange', 7,
'violet', 8) as rank
,row_number() over (partition by idx order by rank) as position
from data
UNPIVOT(color FOR col IN (c1,c2,c3,c4,c5))
qualify position <= 3
)
select
idx
,max("1") as color1
,max("2") as color2
,max("3") as color3
from unpivoted
PIVOT(MAX(color) FOR position IN (1,2,3))
AS p
group by 1
order by 1;
The PIVOT can be swapped for MIN_BY like so:
with data(idx, c1, c2, c3, c4, c5) as (
select *
from values
(1, 'red', 'orange', 'red', 'blue', 'violet'),
(2, 'indigo', 'yellow', 'blue', 'yellow', 'violet')
), unpivoted as (
select
*
,decode(color, 'red', 1,
'orange', 2,
'yellow', 3,
'green',4,
'blue', 5,
'indigo', 6,
'orange', 7,
'violet', 8) as rank
,row_number() over (partition by idx order by rank) as position
from data
UNPIVOT(color FOR col IN (c1,c2,c3,c4,c5))
qualify position <= 3
)
select idx, min_by(color, position, 3) as min3
from unpivoted
group by 1;
at which point, you could pull the three values out in another layer:
with data(idx, c1, c2, c3, c4, c5) as (
select *
from values
(1, 'red', 'orange', 'red', 'blue', 'violet'),
(2, 'indigo', 'yellow', 'blue', 'yellow', 'violet')
), unpivoted as (
select
*
,decode(color, 'red', 1,
'orange', 2,
'yellow', 3,
'green',4,
'blue', 5,
'indigo', 6,
'orange', 7,
'violet', 8) as rank
,row_number() over (partition by idx order by rank) as position
from data
UNPIVOT(color FOR col IN (c1,c2,c3,c4,c5))
), arrayed as (
select idx, min_by(color, position, 3) as min3
from unpivoted
group by 1
)
select
idx,
min3[0]::text as c1,
min3[1]::text as c2,
min3[2]::text as c3
from arrayed;
create table data as
select *
from values
(1, 'red', 'orange', 'red', 'blue', 'violet'),
(2, 'indigo', 'yellow', 'blue', 'yellow', 'violet')
t(idx, c1, c2, c3, c4, c5);
create table ranks as
select *
from values
('red', 1),
('orange', 2),
('yellow', 3),
('green', 4),
('blue', 5),
('indigo', 6),
('orange', 7),
('violet', 8)
t(color, rank);
we can use a natural join like so:
with pos_and_rank as (
select *
,row_number() over (partition by idx order by rank) as position
from (
select idx, color
from data
UNPIVOT(color FOR col IN (c1,c2,c3,c4,c5))
)
natural join ranks
), arrayed as (
select idx, min_by(color, position, 3) as min3
from pos_and_rank
group by 1
)
select
idx,
min3[0]::text as c1,
min3[1]::text as c2,
min3[2]::text as c3
from arrayed
order by idx;
Upvotes: 3
Reputation: 44710
Warning: this is pretty ugly, but it should get the job done.
Borrowing a bit from @Error_2646's idea in the comments on the OP, you could UNPIVOT
your existing data structure, JOIN
to the Rank
information, then re-cobble the data into your prescribed format. The actual practicality of this will likely depend on the scale of the data on which you intend to apply this to, and hasn't been tested on any meaningfully large dataset.
(I've elected not to PIVOT
the data back in and instead have manually assembled it using a few subqueries, since PIVOT
's support for, and performance on, textual data is generally quite suboptimal.)
WITH unpivoted_data AS (
SELECT
[Index],
Color.Color,
ROW_NUMBER() OVER (
PARTITION BY [Index]
ORDER BY
Rank.Rank
) AS PartitionRank
FROM
Color UNPIVOT (
Color FOR Colors IN (
color1, color2, color3, color4, color5
)
) AS Color
JOIN Rank ON LOWER(Rank.Color) = Color.Color
)
SELECT
[Index],
(
SELECT
Color
FROM
unpivoted_data
WHERE
unpivoted_data.[Index] = ranked_colors.[Index]
AND unpivoted_data.[PartitionRank] = 1
) AS Rank1,
(
SELECT
Color
FROM
unpivoted_data
WHERE
unpivoted_data.[Index] = ranked_colors.[Index]
AND unpivoted_data.[PartitionRank] = 2
) AS Rank2,
(
SELECT
Color
FROM
unpivoted_data
WHERE
unpivoted_data.[Index] = ranked_colors.[Index]
AND unpivoted_data.[PartitionRank] = 3
) AS Rank3
FROM
(
SELECT
DISTINCT [Index]
FROM
unpivoted_data
) ranked_colors
ORDER BY
[Index] ASC
Result:
Index | Rank1 | Rank2 | Rank3 |
---|---|---|---|
1 | red | red | orange |
2 | yellow | yellow | blue |
Upvotes: 0