CLV
CLV

Reputation: 31

SQL rank and reorder columns

I have two tables in Snowflake - one with color data and one with ranks for each color:

Color table:

Index color1 color2 color3 color4 color5
1 red orange red blue violet
2 indigo yellow blue yellow violet

Rank table:

Color Rank
Red 1
Orange 2
Yellow 3
Green 4
Blue 5
Indigo 6
Orange 7
Violet 8

I'd like to use the two tables above to rank each colors in each row and show top 3 colors in the resulting table. So resulting table will have red, red, and orange for index 1 and yellow, yellow, and blue for index 2.

Index Rank1 Rank2 Rank3
1 red red orange
2 yellow yellow blue

Any help would be appreciated. Thanks!

Upvotes: 1

Views: 149

Answers (2)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 26078

So using the CTE for the DATA, and encoding the ranking via DECODE (this step could be a TABLE/CTE and a join could be used). We UNPIVOT the five color columns and then assign the color rank, and then rank the ranks with ROW_NUMBER, we then filter out the position above 3 and then use PIVOT to roll these back into wide columns, and use MAX to keep the non-null values.

with data(idx, c1, c2, c3, c4, c5) as (
    select *
    from values
     (1, 'red', 'orange', 'red', 'blue', 'violet'),
     (2, 'indigo', 'yellow', 'blue', 'yellow', 'violet')
), unpivoted as (
    select 
        *
        ,decode(color, 'red', 1, 
            'orange', 2,
            'yellow', 3,
            'green',4,
            'blue', 5,
            'indigo', 6,
            'orange', 7,
            'violet', 8) as rank
        ,row_number() over (partition by idx order by rank) as position
    from data
    UNPIVOT(color FOR col IN (c1,c2,c3,c4,c5))
    qualify position <= 3
)
select 
    idx
    ,max("1") as color1
    ,max("2") as color2
    ,max("3") as color3
from unpivoted
PIVOT(MAX(color) FOR position IN (1,2,3))
      AS p
group by 1
order by 1;

enter image description here

MIN_BY:

The PIVOT can be swapped for MIN_BY like so:

with data(idx, c1, c2, c3, c4, c5) as (
    select *
    from values
     (1, 'red', 'orange', 'red', 'blue', 'violet'),
     (2, 'indigo', 'yellow', 'blue', 'yellow', 'violet')
), unpivoted as (
    select 
        *
        ,decode(color, 'red', 1, 
            'orange', 2,
            'yellow', 3,
            'green',4,
            'blue', 5,
            'indigo', 6,
            'orange', 7,
            'violet', 8) as rank
        ,row_number() over (partition by idx order by rank) as position
    from data
    UNPIVOT(color FOR col IN (c1,c2,c3,c4,c5))
    qualify position <= 3
)
select idx, min_by(color, position, 3) as min3
from unpivoted
group by 1;

enter image description here

at which point, you could pull the three values out in another layer:

with data(idx, c1, c2, c3, c4, c5) as (
    select *
    from values
     (1, 'red', 'orange', 'red', 'blue', 'violet'),
     (2, 'indigo', 'yellow', 'blue', 'yellow', 'violet')
), unpivoted as (
    select 
        *
        ,decode(color, 'red', 1, 
            'orange', 2,
            'yellow', 3,
            'green',4,
            'blue', 5,
            'indigo', 6,
            'orange', 7,
            'violet', 8) as rank
        ,row_number() over (partition by idx order by rank) as position
    from data
    UNPIVOT(color FOR col IN (c1,c2,c3,c4,c5))
), arrayed as (
    select idx, min_by(color, position, 3) as min3
    from unpivoted
    group by 1
)
select 
    idx, 
    min3[0]::text as c1, 
    min3[1]::text as c2, 
    min3[2]::text as c3
from arrayed;

enter image description here

using tables:

create table data as 
    select *
    from values
     (1, 'red', 'orange', 'red', 'blue', 'violet'),
     (2, 'indigo', 'yellow', 'blue', 'yellow', 'violet')
     t(idx, c1, c2, c3, c4, c5);

create table ranks as 
    select *
    from values
    ('red', 1),
    ('orange', 2),
    ('yellow', 3),
    ('green', 4),
    ('blue', 5),
    ('indigo', 6),
    ('orange', 7),
    ('violet', 8)  
    t(color, rank);

we can use a natural join like so:

with pos_and_rank as (
    select *
        ,row_number() over (partition by idx order by rank) as position
    from (
        select idx, color
        from data
            UNPIVOT(color FOR col IN (c1,c2,c3,c4,c5))
    )
    natural join ranks

), arrayed as (
    select idx, min_by(color, position, 3) as min3
    from pos_and_rank
    group by 1
)
select 
    idx, 
    min3[0]::text as c1, 
    min3[1]::text as c2, 
    min3[2]::text as c3
from arrayed
order by idx;

enter image description here

Upvotes: 3

esqew
esqew

Reputation: 44710

Warning: this is pretty ugly, but it should get the job done.

Borrowing a bit from @Error_2646's idea in the comments on the OP, you could UNPIVOT your existing data structure, JOIN to the Rank information, then re-cobble the data into your prescribed format. The actual practicality of this will likely depend on the scale of the data on which you intend to apply this to, and hasn't been tested on any meaningfully large dataset.

(I've elected not to PIVOT the data back in and instead have manually assembled it using a few subqueries, since PIVOT's support for, and performance on, textual data is generally quite suboptimal.)

WITH unpivoted_data AS (
  SELECT 
    [Index], 
    Color.Color, 
    ROW_NUMBER() OVER (
      PARTITION BY [Index] 
      ORDER BY 
        Rank.Rank
    ) AS PartitionRank 
  FROM 
    Color UNPIVOT (
      Color FOR Colors IN (
        color1, color2, color3, color4, color5
      )
    ) AS Color 
    JOIN Rank ON LOWER(Rank.Color) = Color.Color
) 
SELECT 
  [Index], 
  (
    SELECT 
      Color 
    FROM 
      unpivoted_data 
    WHERE 
      unpivoted_data.[Index] = ranked_colors.[Index] 
      AND unpivoted_data.[PartitionRank] = 1
  ) AS Rank1, 
  (
    SELECT 
      Color 
    FROM 
      unpivoted_data 
    WHERE 
      unpivoted_data.[Index] = ranked_colors.[Index] 
      AND unpivoted_data.[PartitionRank] = 2
  ) AS Rank2, 
  (
    SELECT 
      Color 
    FROM 
      unpivoted_data 
    WHERE 
      unpivoted_data.[Index] = ranked_colors.[Index] 
      AND unpivoted_data.[PartitionRank] = 3
  ) AS Rank3 
FROM 
  (
    SELECT 
      DISTINCT [Index] 
    FROM 
      unpivoted_data
  ) ranked_colors 
ORDER BY 
  [Index] ASC

Result:

Index Rank1 Rank2 Rank3
1 red red orange
2 yellow yellow blue

Upvotes: 0

Related Questions