Reputation: 409
I am stuck in an issue where I want the two latest values of a column(REVISION) based on the max(last_updated_date).
Scenario:
There is a Revision column corresponding to each part column( hardware part). Whenever there is any change in the part, the revision gets changed and the last_updated date also changes. Revision can be anything numbers, alphabets, - _ etc.
Suppose if I have 100 parts and 60 parts do not change and 40 changes. So 40 will have at least two latest revisions. In total there would be 60+40*2=140 parts in the output.
Without using QUALIFY, I am getting more than 5M distinct parts. So at least I should get 5M records(in case when no part has underwent any revision).
SELECT DISTINCT FROM_NAME
FROM(
select
CDR_ODS_R_GE_OBJ_HST .FROM_ID as FROM_ID,
CDR_ODS_R_GE_OBJ_HST .FROM_NAME as FROM_NAME ,
CDR_ODS_R_GE_OBJ_HST .FROM_REVISION as FROM_REVISION,
max(CDR_ODS_R_GE_OBJ_HST .LAST_UPDATE_DATE) as LAST_UPDATE_DATE
--RANK( ) OVER ( ORDER BY max(CDR_ODS_R_GE_OBJ_HST .LAST_UPDATE_DATE) DESC) AS RANK1
from GEEDW_PLM_ODS_BULK_V.CDR_ODS_R_GE_OBJ_HST CDR_ODS_R_GE_OBJ_HST
--WHERE CDR_ODS_R_GE_OBJ_HST.FROM_name='323A4747UUP15A'
--QUALIFY RANK1<=2
group by 1,2,3
) TM
14728.20721.304.13308(from_id) R-0331128(from_name) -(revision) 8/7/2013 20:30:02(last_updated date)
But while using qualify <=2 with rank getting only 186 parts.
select
CDR_ODS_R_GE_OBJ_HST .FROM_ID as FROM_ID,
CDR_ODS_R_GE_OBJ_HST .FROM_NAME as FROM_NAME ,
CDR_ODS_R_GE_OBJ_HST .FROM_REVISION as FROM_REVISION,
max(CDR_ODS_R_GE_OBJ_HST .LAST_UPDATE_DATE) as LAST_UPDATE_DATE
,RANK( ) OVER ( ORDER BY max(CDR_ODS_R_GE_OBJ_HST .LAST_UPDATE_DATE) DESC) AS RANK1
from GEEDW_PLM_ODS_BULK_V.CDR_ODS_R_GE_OBJ_HST CDR_ODS_R_GE_OBJ_HST
--WHERE CDR_ODS_R_GE_OBJ_HST.FROM_name='323A4747UUP15A'
QUALIFY RANK1<=2
group by 1,2,3
In Nutshell, from the below query, i want the top two values for revision 0024 corresponding to the latest 2 updated date.
select FROM_NAME,FROM_REVISION, LAST_UPDATE_DATE
from GEEDW_PLM_ODS_BULK_V.CDR_ODS_R_GE_OBJ_HST CDR_ODS_R_GE_OBJ_HST
0024 301345498360631 1/24/2014 11:22:17
0024 431365606243002 12/16/2013 20:16:44
0024 491333037555534 6/6/2013 18:08:51
Upvotes: 2
Views: 692
Reputation: 60482
You need to add a PRATITION BY, otherwise it's ranking over all parts. In your case there where 186 parts with the same max date, all got the same rank 1 and the next rank was 187.
select
CDR_ODS_R_GE_OBJ_HST .FROM_ID as FROM_ID,
CDR_ODS_R_GE_OBJ_HST .FROM_NAME as FROM_NAME ,
CDR_ODS_R_GE_OBJ_HST .FROM_REVISION as FROM_REVISION,
max(CDR_ODS_R_GE_OBJ_HST .LAST_UPDATE_DATE) as LAST_UPDATE_DATE
,RANK( )
OVER (PARTITION BY FROM_NAME
ORDER BY max(CDR_ODS_R_GE_OBJ_HST.LAST_UPDATE_DATE) DESC) AS RANK1
from GEEDW_PLM_ODS_BULK_V.CDR_ODS_R_GE_OBJ_HST CDR_ODS_R_GE_OBJ_HST
--WHERE CDR_ODS_R_GE_OBJ_HST.FROM_name='323A4747UUP15A'
group by 1,2,3
QUALIFY RANK1<=2
Btw, are you sure you need the GROUP BY? Maybe you confused this with the old deprecated RANK syntax where GROUP BY was used instead of PARTITION. This might return the same result more efficiently:
select
CDR_ODS_R_GE_OBJ_HST .FROM_ID as FROM_ID,
CDR_ODS_R_GE_OBJ_HST .FROM_NAME as FROM_NAME ,
CDR_ODS_R_GE_OBJ_HST .FROM_REVISION as FROM_REVISION,
CDR_ODS_R_GE_OBJ_HST .LAST_UPDATE_DATE as LAST_UPDATE_DATE
,RANK( )
OVER (PARTITION BY FROM_NAME
ORDER BY max(CDR_ODS_R_GE_OBJ_HST.LAST_UPDATE_DATE) DESC) AS RANK1
from GEEDW_PLM_ODS_BULK_V.CDR_ODS_R_GE_OBJ_HST CDR_ODS_R_GE_OBJ_HST
--WHERE CDR_ODS_R_GE_OBJ_HST.FROM_name='323A4747UUP15A'
QUALIFY RANK1<=2
Upvotes: 1