pythondumb
pythondumb

Reputation: 1207

SQL: An alternative to Group By approach using Partion By

I have a table in a DW system (say AWS SnowFlake):

UPC_CODE    A_PRICE     A_QTY    DATE       COMPANY_CODE    A_CAT
1001        100.25       2     2021-05-06      1             PB
1001        2122.75      10    2021-05-01      1             PB
1002        212.75       5     2021-05-07      2             PT
1002        3100.75      10    2021-05-01      2             PB

What I am looking for is :

For each UPC_CODE and COMPANY_CODE the latest data should be picked up

So the resultant table should be like below:

UPC_CODE    A_PRICE     A_QTY    DATE       COMPANY_CODE    A_CAT
1001        100.25       2     2021-05-06      1             PB
1002        212.75       5     2021-05-07      2             PT

Approach: Below SQL string

SELECT  UPC_CODE,A_PRICE,A_QTY,MAX(DATE) AS F_DATE,COMPANY_CODE,A_CAT 
FROM <table_name>
GROUP BY 1,2,3,5,6

Can I have an alternative approach using partionby()?

Upvotes: 0

Views: 44

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269603

In Snowflake (which your first line suggests), you would use QUALIFY:

SELECT UPC_CODE, A_PRICE, A_QTY, DATE AS F_DATE, COMPANY_CODE, A_CAT 
FROM <table_name>
QUALIFY ROW_NUMBER() OVER (PARTITION BYUPC_CODE, A_PRICE, A_QTY, COMPANY_CODE, A_CAT
                           ORDER BY DATE DESC
                          ) = 1;

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520988

Your current GROUP BY query doesn't really do what you have in mind. One canonical approach here uses ROW_NUMBER:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY UPC_CODE, COMPANY_CODE ORDER BY DATE DESC) rn
    FROM yourTable
)

SELECT UPC_CODE, A_PRICE, A_QTY, DATE, COMPANY_CODE, A_CAT
FROM cte
WHERE rn = 1;

If you did want to use a GROUP BY approach, here is one way to do that:

SELECT t1.*
FROM yourTable t1
INNER JOIN
(
    SELECT UPC_CODE, COMPANY_CODE, MAX(DATE) AS MAX_DATE
    FROM yourTable
    GROUP BY UPC_CODE, COMPANY_CODE
) t2
    ON t2.UPC_CODE = t1.UPC_CODE AND
       t2.COMPANY_CODE = t1.COMPANY_CODE AND
       t2.MAX_DATE = t1.DATE;

Upvotes: 2

Related Questions