ramadongre
ramadongre

Reputation: 101

Perform determinations within a data partition

I have a dataset as below from which I would like to draw some inferences.

Id Nbr Dt Status Cont1Sta1 DateLagInDays Recurrence
1 2 2023-10-1 1
1 2 2023-11-2 0
1 2 2023-12-13 0
1 3 2023-10-1 0
1 3 2023-11-2 0
1 3 2023-12-13 1
1 9 2023-11-1 0
1 9 2023-12-1 1
1 9 2024-1-1 0

I have already created a SQL server data partition based on ID, number, and ordered by dt in ascending order.

The inferences needed are:

  1. For the chosen partition are there any instances with status = 1.
  2. If #1 is true, the date difference between the earliest of the instances where status=0 until the row has status=1.
  1. Within the partition, are there any new rows disregarding status value after there is at least one row with status=1?

Is this possible using partition basic logic? I did try using lag and lead within the partition, but it would not yield a good result.

Any suggestion to write a good concise code will be helpful.

While I tried few other things, I have below clean code that I am using so far. I would like to have the columns Cont1Sta1, DateLagInDays, Recurrence be filled per partition after transformation, in the first instance row.

declare @t table
(
    id int,
    nbr int,
    dt date,
    status smallint,
    Cont1Sta1 bit,--if the chosen partition has atleast one status=1
    DateLagInDays int,--date diff in days from earliest record within partition to when status=1
    Recurrence bit --does partition has atleast one new row after one possible row that has status=1
)
insert into @t(id,nbr,dt,status) select 1,9,'2023-11-1',0
insert into @t(id,nbr,dt,status) select 1,9,'2023-12-1',1
insert into @t(id,nbr,dt,status) select 1,9,'2024-1-1',0
insert into @t(id,nbr,dt,status) select 1,2,'2023-10-1',1
insert into @t(id,nbr,dt,status) select 1,2,'2023-11-2',0
insert into @t(id,nbr,dt,status) select 1,2,'2023-12-13',0
insert into @t(id,nbr,dt,status) select 1,3,'2023-10-1',0
insert into @t(id,nbr,dt,status) select 1,3,'2023-11-2',0
insert into @t(id,nbr,dt,status) select 1,3,'2023-12-13',1;

select 
    id, nbr, dt,status,
    rank() over (partition by id, nbr, status order by id, nbr, dt asc, status desc) rownbr
from 
    @t
order by 
    id, nbr, dt asc, status desc;

Here is the desired tabular result:

Id Nbr Dt Status Cont1Sta1 DateLagInDays Recurrence
1 2 2023-10-1 0 1 32 1
1 2 2023-11-2 1 0 0 0
1 2 2023-12-13 0 0 0 0
1 3 2023-10-1 0 1 73 0
1 3 2023-11-2 0 0 0 0
1 3 2023-12-13 1 0 0 0
1 9 2023-11-1 1 1 0 1
1 9 2023-12-1 0 0 0 0
1 9 2024-1-1 0 0 0 0

Upvotes: 0

Views: 103

Answers (2)

Charlieface
Charlieface

Reputation: 72229

After some clarification, it seems you only want the first row per group. You can use row-numbering for that.

The rest of your calculations can be done with conditional aggregation using window functions.

WITH cte AS (
    SELECT
      t.id,
      t.nbr,
      t.dt,
      t.status,
      COUNT(CASE WHEN t.status = 1 THEN 1 END) OVER (PARTITION BY t.ID, t.nbr) AS Cont1Sta1,
      ROW_NUMBER() OVER (PARTITION BY t.ID, t.nbr ORDER BY t.dt) AS rn,
      DATEDIFF(day, t.dt, MIN(CASE WHEN t.status = 1 THEN t.dt END) OVER (PARTITION BY t.ID, t.nbr)) AS DateLagInDays,
      IIF(
          MAX(t.dt) OVER (PARTITION BY t.ID, t.nbr) >
          MIN(CASE WHEN t.status = 1 THEN t.dt END) OVER (PARTITION BY t.ID, t.nbr),
        1, 0) AS Recurrence
    FROM @t t
)
SELECT *
FROM cte
WHERE rn = 1;

db<>fiddle

Upvotes: 2

Dale K
Dale K

Reputation: 27388

Comments explaining the logic are embedded in the query.

with cte as (
    select id, nbr, dt, status
        -- get a row number so we can only show the calculated data once per partition
        , row_number() over (partition by id, nbr order by id, nbr, dt asc, status desc) rn
        -- Determine whether status 1 exists within the partition
        , max(status) over (partition by id, nbr) partition_status
        -- Get the first date where Status = 1
        , min(case when Status = 1 then dt else null end) over (partition by id, nbr) partition_dt1
        -- Get the last date where Status = 1
        -- Only required if there is the possibility of more than one Status = 1 row in a partition
        , max(case when Status = 1 then dt else null end) over (partition by id, nbr) partition_dt2
    from @t
)
select id, nbr, dt, status
    -- On the first row of each partition show whether a status = 1 exists
    , case when rn = 1 then partition_status else 0 end Cont1Sta1
    -- On the first row of each partition show the date lag in days
    , case when rn = 1 then datediff(day, dt, coalesce(partition_dt1, dt)) else 0 end DateLagInDays
    -- On the first row of each partition show whether any records exist after the last Status = 1
    , case when rn = 1 and exists (select 1 from cte c2 where c2.id = c1.id and c2.nbr = c1.nbr and c2.dt > c1.partition_dt2) then 1 else 0 end Recurrence
from cte c1
order by id, nbr, dt asc, status desc;

Returns as requested:

id nbr dt status Cont1Sta1 DateLagInDays Recurrence
1 2 2023-10-01 1 1 0 1
1 2 2023-11-02 0 0 0 0
1 2 2023-12-13 0 0 0 0
1 3 2023-10-01 0 1 73 0
1 3 2023-11-02 0 0 0 0
1 3 2023-12-13 1 0 0 0
1 9 2023-11-01 0 1 30 1
1 9 2023-12-01 1 0 0 0
1 9 2024-01-01 0 0 0 0

DBFiddle

Upvotes: 3

Related Questions