iAmGregorJ
iAmGregorJ

Reputation: 123

SQL - filtering out entire rows to get rid of "placeholder" data

The following query:

select
    dim.DW_SK_IndsatsDetaljer,
    dim.VisitationId as VisitationsId,
    dim.BestillingId as BestillingId,
    dim.ServicePeriodeSlut as ServicePeriodeSlut
from nexus2.Dim_IndsatsDetaljer as dim
where DW_SK_IndsatsDetaljer = '3803872' or DW_SK_IndsatsDetaljer = '3803871'

Produces the following:

DW_SK_IndsatsDetaljer   VisitationsId   BestillingId    ServicePeriodeSlut
3803871                 589682          589683          2019-09-23
3803872                 589682          589682          9999-12-31

From the source, I know that 9999-12-31 is a placeholder which exists until the system registers an actual date. So I DO need it, except in cases like this where another date was registered. I can see from the VisitationsId, that it's the same transaction, so I don't need the placeholder in this case. DW_SK_IndsatsDetaljer is nothing more than a row Id, and BestillingId is generated when something happens in the Visitation.

So I believe that getting the max(bestillingId) is key here - but I'm not sure how to do what I need to?

Basically, I need to group by VisitationId, and find the max(BestillingId) and just take the date from that. I've tried

select
    dim.DW_SK_IndsatsDetaljer,
    dim.VisitationId as VisitationsId,
    max(dim.BestillingId) as BestillingId,
    dim.ServicePeriodeSlut as ServicePeriodeSlut
from nexus2.Dim_IndsatsDetaljer as dim
where DW_SK_IndsatsDetaljer = '3803872' or DW_SK_IndsatsDetaljer = '3803871'
group by    dim.DW_SK_IndsatsDetaljer,
    VisitationId,
    ServicePeriodeSlut

which gives me the same result as above.

Can anyone point me in the right direction?

Upvotes: 1

Views: 98

Answers (2)

iAmGregorJ
iAmGregorJ

Reputation: 123

I actually stumbled upon the answer myself:

select
x.DW_SK_IndsatsDetaljer,
x.VisitationId,
max(x.BestillingId),
x.ServicePeriodeSlut
from (
        select
            f.VisitationId,
            max(f.BestillingId)
        from nexus2.Dim_IndsatsDetaljer as f
        group by VisitationId
    )

as y inner join nexus2.Dim_IndsatsDetaljer as x on x.VisitationId = y.VisitationId and x.BestillingId = y.BestillingId

So basically select what I need from the result of a simple query to get MAX(BestillingId) and joining the original table with that result. Worked like a charm, and wasn't very ressouce-heavy (the "real" code has a TON of columns, and more than just ServicePeriodeSlut that gave the problem).

Upvotes: 1

With row_number() window rank function you can select rows with max BestillingId for each visitationid

with cte as 
(
select
    dim.DW_SK_IndsatsDetaljer,
    dim.VisitationId as VisitationsId,
    dim.BestillingId as BestillingId,
    dim.ServicePeriodeSlut as ServicePeriodeSlut,
    row_number()over(partition by VisitationsId  order by BestillingId  desc)rn   
from nexus2.Dim_IndsatsDetaljer as dim
where DW_SK_IndsatsDetaljer = '3803872' or DW_SK_IndsatsDetaljer = '3803871'
)
select     dim.DW_SK_IndsatsDetaljer,    VisitationsId,    BestillingId, ServicePeriodeSlut  f
rom cte where rn=1

Upvotes: 0

Related Questions