I have two tables in Snowflake - one with color data and one with ranks for each color: Color table: Index color1 color2 color3 color4 color5 1 red orange red blue violet 2 indigo yellow blue yellow violet Rank table: Color Rank Red 1 Orange 2 Yellow 3 Green 4 Blue 5 Indigo 6 Orange 7 Violet 8 I'd like to use the two tables above to rank each colors in each row and show top 3 colors in the resulting table. So resulting table will have red, red, and orange for index 1 and yellow, yellow, and blue for index 2. Index Rank1 Rank2 Rank3 1 red red orange 2 yellow yellow blue Any help would be appreciated. Thanks!

Reputation: 31

SQL rank and reorder columns

I have two tables in Snowflake - one with color data and one with ranks for each color:

Color table:

Index	color1	color2	color3	color4	color5
1	red	orange	red	blue	violet
2	indigo	yellow	blue	yellow	violet

Rank table:

Color	Rank
Red	1
Orange	2
Yellow	3
Green	4
Blue	5
Indigo	6
Orange	7
Violet	8

I'd like to use the two tables above to rank each colors in each row and show top 3 colors in the resulting table. So resulting table will have red, red, and orange for index 1 and yellow, yellow, and blue for index 2.

Index	Rank1	Rank2	Rank3
1	red	red	orange
2	yellow	yellow	blue

Any help would be appreciated. Thanks!

Upvotes: 1

Answers (2)

Simeon Pilgrim

Reputation: 26078

So using the CTE for the DATA, and encoding the ranking via DECODE (this step could be a TABLE/CTE and a join could be used). We UNPIVOT the five color columns and then assign the color rank, and then rank the ranks with ROW_NUMBER, we then filter out the position above 3 and then use PIVOT to roll these back into wide columns, and use MAX to keep the non-null values.

with data(idx, c1, c2, c3, c4, c5) as (
    select *
    from values
     (1, 'red', 'orange', 'red', 'blue', 'violet'),
     (2, 'indigo', 'yellow', 'blue', 'yellow', 'violet')
), unpivoted as (
    select 
        *
        ,decode(color, 'red', 1, 
            'orange', 2,
            'yellow', 3,
            'green',4,
            'blue', 5,
            'indigo', 6,
            'orange', 7,
            'violet', 8) as rank
        ,row_number() over (partition by idx order by rank) as position
    from data
    UNPIVOT(color FOR col IN (c1,c2,c3,c4,c5))
    qualify position <= 3
)
select 
    idx
    ,max("1") as color1
    ,max("2") as color2
    ,max("3") as color3
from unpivoted
PIVOT(MAX(color) FOR position IN (1,2,3))
      AS p
group by 1
order by 1;

MIN_BY:

The PIVOT can be swapped for MIN_BY like so:

with data(idx, c1, c2, c3, c4, c5) as (
    select *
    from values
     (1, 'red', 'orange', 'red', 'blue', 'violet'),
     (2, 'indigo', 'yellow', 'blue', 'yellow', 'violet')
), unpivoted as (
    select 
        *
        ,decode(color, 'red', 1, 
            'orange', 2,
            'yellow', 3,
            'green',4,
            'blue', 5,
            'indigo', 6,
            'orange', 7,
            'violet', 8) as rank
        ,row_number() over (partition by idx order by rank) as position
    from data
    UNPIVOT(color FOR col IN (c1,c2,c3,c4,c5))
    qualify position <= 3
)
select idx, min_by(color, position, 3) as min3
from unpivoted
group by 1;

at which point, you could pull the three values out in another layer:

with data(idx, c1, c2, c3, c4, c5) as (
    select *
    from values
     (1, 'red', 'orange', 'red', 'blue', 'violet'),
     (2, 'indigo', 'yellow', 'blue', 'yellow', 'violet')
), unpivoted as (
    select 
        *
        ,decode(color, 'red', 1, 
            'orange', 2,
            'yellow', 3,
            'green',4,
            'blue', 5,
            'indigo', 6,
            'orange', 7,
            'violet', 8) as rank
        ,row_number() over (partition by idx order by rank) as position
    from data
    UNPIVOT(color FOR col IN (c1,c2,c3,c4,c5))
), arrayed as (
    select idx, min_by(color, position, 3) as min3
    from unpivoted
    group by 1
)
select 
    idx, 
    min3[0]::text as c1, 
    min3[1]::text as c2, 
    min3[2]::text as c3
from arrayed;

using tables:

create table data as 
    select *
    from values
     (1, 'red', 'orange', 'red', 'blue', 'violet'),
     (2, 'indigo', 'yellow', 'blue', 'yellow', 'violet')
     t(idx, c1, c2, c3, c4, c5);

create table ranks as 
    select *
    from values
    ('red', 1),
    ('orange', 2),
    ('yellow', 3),
    ('green', 4),
    ('blue', 5),
    ('indigo', 6),
    ('orange', 7),
    ('violet', 8)  
    t(color, rank);

we can use a natural join like so:

with pos_and_rank as (
    select *
        ,row_number() over (partition by idx order by rank) as position
    from (
        select idx, color
        from data
            UNPIVOT(color FOR col IN (c1,c2,c3,c4,c5))
    )
    natural join ranks

), arrayed as (
    select idx, min_by(color, position, 3) as min3
    from pos_and_rank
    group by 1
)
select 
    idx, 
    min3[0]::text as c1, 
    min3[1]::text as c2, 
    min3[2]::text as c3
from arrayed
order by idx;

Upvotes: 3

esqew

Reputation: 44710

Warning: this is pretty ugly, but it should get the job done.

Borrowing a bit from @Error_2646's idea in the comments on the OP, you could UNPIVOT your existing data structure, JOIN to the Rank information, then re-cobble the data into your prescribed format. The actual practicality of this will likely depend on the scale of the data on which you intend to apply this to, and hasn't been tested on any meaningfully large dataset.

(I've elected not to PIVOT the data back in and instead have manually assembled it using a few subqueries, since PIVOT's support for, and performance on, textual data is generally quite suboptimal.)

WITH unpivoted_data AS (
  SELECT 
    [Index], 
    Color.Color, 
    ROW_NUMBER() OVER (
      PARTITION BY [Index] 
      ORDER BY 
        Rank.Rank
    ) AS PartitionRank 
  FROM 
    Color UNPIVOT (
      Color FOR Colors IN (
        color1, color2, color3, color4, color5
      )
    ) AS Color 
    JOIN Rank ON LOWER(Rank.Color) = Color.Color
) 
SELECT 
  [Index], 
  (
    SELECT 
      Color 
    FROM 
      unpivoted_data 
    WHERE 
      unpivoted_data.[Index] = ranked_colors.[Index] 
      AND unpivoted_data.[PartitionRank] = 1
  ) AS Rank1, 
  (
    SELECT 
      Color 
    FROM 
      unpivoted_data 
    WHERE 
      unpivoted_data.[Index] = ranked_colors.[Index] 
      AND unpivoted_data.[PartitionRank] = 2
  ) AS Rank2, 
  (
    SELECT 
      Color 
    FROM 
      unpivoted_data 
    WHERE 
      unpivoted_data.[Index] = ranked_colors.[Index] 
      AND unpivoted_data.[PartitionRank] = 3
  ) AS Rank3 
FROM 
  (
    SELECT 
      DISTINCT [Index] 
    FROM 
      unpivoted_data
  ) ranked_colors 
ORDER BY 
  [Index] ASC

Result:

Index	Rank1	Rank2	Rank3
1	red	red	orange
2	yellow	yellow	blue

Upvotes: 0

SQL rank and reorder columns

Answers (2)

MIN_BY:

using tables:

Related Questions