Reputation: 15
I want to create a snapshot of the below table where it shows only 1 line per user with same number of columns.
My Priority for selecting a user is from the source field by portal.
Below are the details --
How do I achieve in Snowflake SQL? Please see the original table and expected output below.
ORIGINAL TABLE:
Date | user | portal | country | state | source |
---|---|---|---|---|---|
12/1/21 | 2346232 | AAA | CA | ON | Prod |
1/30/22 | 2534657 | AAA | CA | BC | QA |
3/31/22 | 2534657 | AAA | US | TX | Max |
5/30/22 | 3454364 | AAA | US | TX | Prod |
7/29/22 | 3543591 | AAA | US | CA | Prod |
9/27/22 | 3543591 | AAA | US | CA | Max |
11/26/22 | 3543753 | AAA | US | CA | Stage |
1/25/23 | 3546534 | AAA | CA | ON | Max |
3/26/23 | 3543591 | BBB | US | CA | Max |
EXPECTED OUTPUT FROM SNOWFLAKE-SQL (highlighted in Bold):
Date | user | portal | country | state | source |
---|---|---|---|---|---|
12/1/21 | 2346232 | AAA | CA | ON | Prod |
1/30/22 | 2534657 | AAA | CA | BC | QA |
5/30/22 | 3454364 | AAA | US | TX | Prod |
7/29/22 | 3543591 | AAA | US | CA | Prod |
11/26/22 | 3543753 | AAA | US | CA | Stage |
1/25/23 | 3546534 | AAA | CA | ON | Max |
3/26/23 | 3543591 | BBB | US | CA | Max |
Upvotes: 0
Views: 1777
Reputation: 26120
So your data as a CTE:
with data(Date, user, portal, country, state, source) as (
select * from values
('12/1/21 ', 2346232, 'AAA', 'CA', 'ON', 'Prod'),
('1/30/22 ', 2534657, 'AAA', 'CA', 'BC', 'QA'),
('3/31/22 ', 2534657, 'AAA', 'US', 'TX', 'Max'),
('5/30/22 ', 3454364, 'AAA', 'US', 'TX', 'Prod'),
('7/29/22 ', 3543591, 'AAA', 'US', 'CA', 'Prod'),
('9/27/22 ', 3543591, 'AAA', 'US', 'CA', 'Max'),
('11/26/22', 3543753, 'AAA', 'US', 'CA', 'Stage'),
('1/25/23 ', 3546534, 'AAA', 'CA', 'ON', 'Max'),
('3/26/23 ', 3543591, 'BBB', 'US', 'CA', 'Max')
)
and a simple check of the rules:
using a CASE statement to turn source
into a number for ordering, and using row_number() to allocate a per protal/user ranking:
select d.*
,case source
when 'Prod' then 1
when 'QA' then 2
when 'Stage' then 3
when 'Max' then 4
end as rank
,row_number() over (partition by portal, user order by rank) as rn
from data as d;
gives:
DATE | USER | PORTAL | COUNTRY | STATE | SOURCE | RANK | RN |
---|---|---|---|---|---|---|---|
12/1/21 | 2,346,232 | AAA | CA | ON | Prod | 1 | 1 |
1/30/22 | 2,534,657 | AAA | CA | BC | QA | 2 | 1 |
3/31/22 | 2,534,657 | AAA | US | TX | Max | 4 | 2 |
5/30/22 | 3,454,364 | AAA | US | TX | Prod | 1 | 1 |
7/29/22 | 3,543,591 | AAA | US | CA | Prod | 1 | 1 |
9/27/22 | 3,543,591 | AAA | US | CA | Max | 4 | 2 |
11/26/22 | 3,543,753 | AAA | US | CA | Stage | 3 | 1 |
1/25/23 | 3,546,534 | AAA | CA | ON | Max | 4 | 1 |
3/26/23 | 3,543,591 | BBB | US | CA | Max | 4 | 1 |
at this point it looks good, now we can use a QUALIFY to only take the values we want, but first we can swap the CASE for a DECODE to make it more tidy to use in-line:
thus:
,DECODE(source, 'Prod', 1, 'QA', 2, 'Stage', 3, 'Max', 4)
gives the same results as the case, thus the solution is:
select d.*
from data as d
qualify row_number() over (partition by portal, user order by DECODE(source, 'Prod', 1, 'QA', 2, 'Stage', 3, 'Max', 4)) = 1
DATE | USER | PORTAL | COUNTRY | STATE | SOURCE |
---|---|---|---|---|---|
12/1/21 | 2,346,232 | AAA | CA | ON | Prod |
1/30/22 | 2,534,657 | AAA | CA | BC | QA |
5/30/22 | 3,454,364 | AAA | US | TX | Prod |
7/29/22 | 3,543,591 | AAA | US | CA | Prod |
11/26/22 | 3,543,753 | AAA | US | CA | Stage |
1/25/23 | 3,546,534 | AAA | CA | ON | Max |
3/26/23 | 3,543,591 | BBB | US | CA | Max |
a side note:
you data did not need date
, country
, or state
,
and user 3454364
repeated the same logic of 2346232
so could have just been, and likely could have been smaller again.
with data(user, portal, source) as (
select * from values
(1, 'A', 'Prod'),
(2, 'A', 'QA'),
(2, 'A', 'Max'),
(4, 'A', 'Prod'),
(4, 'A', 'Max'),
(5, 'A', 'Stage'),
(6, 'A', 'Max'),
(4, 'B', 'Max')
)
select d.*
from data as d
qualify row_number() over (partition by portal, user order by DECODE(source, 'Prod', 1, 'QA', 2, 'Stage', 3, 'Max', 4)) = 1
USER | PORTAL | SOURCE |
---|---|---|
1 | A | Prod |
2 | A | QA |
4 | A | Prod |
5 | A | Stage |
6 | A | Max |
4 | B | Max |
Upvotes: 1