SAugustine
SAugustine

Reputation: 15

How to select records based on the multiple conditions in snowflake-SQL?

I want to create a snapshot of the below table where it shows only 1 line per user with same number of columns.

My Priority for selecting a user is from the source field by portal.

Below are the details --

How do I achieve in Snowflake SQL? Please see the original table and expected output below.

ORIGINAL TABLE:

Date user portal country state source
12/1/21 2346232 AAA CA ON Prod
1/30/22 2534657 AAA CA BC QA
3/31/22 2534657 AAA US TX Max
5/30/22 3454364 AAA US TX Prod
7/29/22 3543591 AAA US CA Prod
9/27/22 3543591 AAA US CA Max
11/26/22 3543753 AAA US CA Stage
1/25/23 3546534 AAA CA ON Max
3/26/23 3543591 BBB US CA Max

EXPECTED OUTPUT FROM SNOWFLAKE-SQL (highlighted in Bold):

Date user portal country state source
12/1/21 2346232 AAA CA ON Prod
1/30/22 2534657 AAA CA BC QA
5/30/22 3454364 AAA US TX Prod
7/29/22 3543591 AAA US CA Prod
11/26/22 3543753 AAA US CA Stage
1/25/23 3546534 AAA CA ON Max
3/26/23 3543591 BBB US CA Max

Upvotes: 0

Views: 1777

Answers (1)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 26120

So your data as a CTE:

with data(Date, user, portal, country, state, source) as (
    select * from values
    ('12/1/21 ', 2346232, 'AAA', 'CA', 'ON', 'Prod'),
    ('1/30/22 ', 2534657, 'AAA', 'CA', 'BC', 'QA'),
    ('3/31/22 ', 2534657, 'AAA', 'US', 'TX', 'Max'),
    ('5/30/22 ', 3454364, 'AAA', 'US', 'TX', 'Prod'),
    ('7/29/22 ', 3543591, 'AAA', 'US', 'CA', 'Prod'),
    ('9/27/22 ', 3543591, 'AAA', 'US', 'CA', 'Max'),
    ('11/26/22', 3543753, 'AAA', 'US', 'CA', 'Stage'),
    ('1/25/23 ', 3546534, 'AAA', 'CA', 'ON', 'Max'),
    ('3/26/23 ', 3543591, 'BBB', 'US', 'CA', 'Max')
)

and a simple check of the rules:

using a CASE statement to turn source into a number for ordering, and using row_number() to allocate a per protal/user ranking:

select d.*
    ,case source
        when 'Prod' then 1
        when 'QA' then 2
        when 'Stage' then 3
        when 'Max' then 4
    end as rank
    ,row_number() over (partition by portal, user order by rank) as rn
from data as d;

gives:

DATE USER PORTAL COUNTRY STATE SOURCE RANK RN
12/1/21 2,346,232 AAA CA ON Prod 1 1
1/30/22 2,534,657 AAA CA BC QA 2 1
3/31/22 2,534,657 AAA US TX Max 4 2
5/30/22 3,454,364 AAA US TX Prod 1 1
7/29/22 3,543,591 AAA US CA Prod 1 1
9/27/22 3,543,591 AAA US CA Max 4 2
11/26/22 3,543,753 AAA US CA Stage 3 1
1/25/23 3,546,534 AAA CA ON Max 4 1
3/26/23 3,543,591 BBB US CA Max 4 1

at this point it looks good, now we can use a QUALIFY to only take the values we want, but first we can swap the CASE for a DECODE to make it more tidy to use in-line:

thus:

,DECODE(source, 'Prod', 1, 'QA', 2, 'Stage', 3, 'Max', 4)

gives the same results as the case, thus the solution is:

select d.*
from data as d
qualify row_number() over (partition by portal, user order by DECODE(source, 'Prod', 1, 'QA', 2, 'Stage', 3, 'Max', 4)) = 1
DATE USER PORTAL COUNTRY STATE SOURCE
12/1/21 2,346,232 AAA CA ON Prod
1/30/22 2,534,657 AAA CA BC QA
5/30/22 3,454,364 AAA US TX Prod
7/29/22 3,543,591 AAA US CA Prod
11/26/22 3,543,753 AAA US CA Stage
1/25/23 3,546,534 AAA CA ON Max
3/26/23 3,543,591 BBB US CA Max

a side note:

you data did not need date, country, or state, and user 3454364 repeated the same logic of 2346232

so could have just been, and likely could have been smaller again.

with data(user, portal, source) as (
    select * from values
    (1, 'A', 'Prod'),
    (2, 'A', 'QA'),
    (2, 'A', 'Max'),
    (4, 'A', 'Prod'),
    (4, 'A', 'Max'),
    (5, 'A', 'Stage'),
    (6, 'A', 'Max'),
    (4, 'B', 'Max')
)
select d.*
from data as d
qualify row_number() over (partition by portal, user order by DECODE(source, 'Prod', 1, 'QA', 2, 'Stage', 3, 'Max', 4)) = 1
USER PORTAL SOURCE
1 A Prod
2 A QA
4 A Prod
5 A Stage
6 A Max
4 B Max

Upvotes: 1

Related Questions