Reputation: 1207
I have a table in a DW system (say AWS SnowFlake):
UPC_CODE A_PRICE A_QTY DATE COMPANY_CODE A_CAT
1001 100.25 2 2021-05-06 1 PB
1001 2122.75 10 2021-05-01 1 PB
1002 212.75 5 2021-05-07 2 PT
1002 3100.75 10 2021-05-01 2 PB
What I am looking for is :
For each UPC_CODE and COMPANY_CODE the latest data should be picked up
So the resultant table should be like below:
UPC_CODE A_PRICE A_QTY DATE COMPANY_CODE A_CAT
1001 100.25 2 2021-05-06 1 PB
1002 212.75 5 2021-05-07 2 PT
Approach: Below SQL string
SELECT UPC_CODE,A_PRICE,A_QTY,MAX(DATE) AS F_DATE,COMPANY_CODE,A_CAT
FROM <table_name>
GROUP BY 1,2,3,5,6
Can I have an alternative approach using partionby()
?
Upvotes: 0
Views: 44
Reputation: 1269603
In Snowflake (which your first line suggests), you would use QUALIFY
:
SELECT UPC_CODE, A_PRICE, A_QTY, DATE AS F_DATE, COMPANY_CODE, A_CAT
FROM <table_name>
QUALIFY ROW_NUMBER() OVER (PARTITION BYUPC_CODE, A_PRICE, A_QTY, COMPANY_CODE, A_CAT
ORDER BY DATE DESC
) = 1;
Upvotes: 0
Reputation: 520988
Your current GROUP BY
query doesn't really do what you have in mind. One canonical approach here uses ROW_NUMBER
:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY UPC_CODE, COMPANY_CODE ORDER BY DATE DESC) rn
FROM yourTable
)
SELECT UPC_CODE, A_PRICE, A_QTY, DATE, COMPANY_CODE, A_CAT
FROM cte
WHERE rn = 1;
If you did want to use a GROUP BY
approach, here is one way to do that:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT UPC_CODE, COMPANY_CODE, MAX(DATE) AS MAX_DATE
FROM yourTable
GROUP BY UPC_CODE, COMPANY_CODE
) t2
ON t2.UPC_CODE = t1.UPC_CODE AND
t2.COMPANY_CODE = t1.COMPANY_CODE AND
t2.MAX_DATE = t1.DATE;
Upvotes: 2