DJC
DJC

Reputation: 1611

Splitting a table to varying number of rows with unknown grouping

I have a table with information about service involvement and group assignments for two individuals. A person is identified by UNIQ_ID; service start and end dates by START_DT and END_DT; and their group assignment by GRP, which is assigned on ASSIGN_DT.

UNIQ_ID   START_DT   END_DT   ASSIGN_DT   GRP
888       1-Mar      10-Mar   3-Mar       Red
888       1-Mar      10-Mar   7-Mar       Yellow
999       15-Mar     20-Mar   15-Mar      Yellow

A person does not always get their group assignment on their first day - 888 does not, while 999 does. Until a person gets their group assignment, their GRP should be 'Unknown'. Assignment starts on the day of assignment, rather than the following day, and ends the day before the next group assignment, or when their service ends.

Can anyone help me with an efficient way to get to the following result set?

UNIQ_ID   ASSIGN_START_DT   ASSIGN_END_DT     GRP
888       1-Mar             2-Mar             Unknown
888       3-Mar             6-Mar             Red
888       7-Mar             10-Mar            Yellow
999       15-Mar            20-Mar            Yellow

Reproducible code below:

CREATE TABLE EXAMPLE_1
(
   UNIQ_ID     NUMBER,
   START_DT    DATE,
   END_DT      DATE,
   ASSIGN_DT   DATE,
   GRP         VARCHAR2 (10)
);
alter session set nls_date_format='dd-mon-yy';
INSERT INTO EXAMPLE_1 VALUES (888, TO_DATE('01-MAR-20'), TO_DATE('10-MAR-20'), TO_DATE('03-MAR-20'), 'Red');
INSERT INTO EXAMPLE_1 VALUES (888, TO_DATE('01-MAR-20'), TO_DATE('10-MAR-20'), TO_DATE('07-MAR-20'), 'Yellow');
INSERT INTO EXAMPLE_1 VALUES (999, TO_DATE('15-MAR-20'), TO_DATE('20-MAR-20'), TO_DATE('15-MAR-20'), 'Yellow');

This is my current solution. Is there any way to do this more efficiently?

WITH SEQ1
     AS (SELECT DISTINCT "UNIQ_ID", "DT", "END_DT"
           FROM EXAMPLE_1
                UNPIVOT
                   ("DT" FOR "TIMES" IN ("START_DT", "ASSIGN_DT"))),
     SEQ2
     AS (SELECT t1."UNIQ_ID",
                "DT",
                NVL (
                   LEAD ("DT" - 1, 1)
                      OVER (PARTITION BY "UNIQ_ID" ORDER BY "DT"),
                   "END_DT")
                   AS "END_DT"
           FROM SEQ1 t1),
     SEQ3
     AS (SELECT t1.*, NVL (t2."GRP", 'Unknown') AS "GRP"
           FROM SEQ2 t1
                LEFT JOIN EXAMPLE_1 t2
                   ON     t1."UNIQ_ID" = t2."UNIQ_ID"
                      AND t2."ASSIGN_DT" BETWEEN t1."DT" AND t1."END_DT")
  SELECT *
    FROM SEQ3
ORDER BY 1, 2;

Upvotes: 0

Views: 58

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269633

My idea is to combine the state changes into a single table and then use window functions. This is a little complicated because of overlapping timeframes. But for the data you provided, this works:

with ug as (
       select uniq_id, grp, assign_dt, end_dt
       from example_1
       union all
       select uniq_id, null, start_dt, assign_dt
       from example_1
       where assign_dt <> start_dt
      )
select uniq_id, grp, assign_dt,
       lead(assign_dt, 1, min(end_dt) - 1) over (partition by uniq_id order by assign_dt) as end_dt
from ug
group by uniq_id, grp, assign_dt
order by uniq_id, assign_dt;

Here is a db<>fiddle.

Upvotes: 1

Sayan Malakshinov
Sayan Malakshinov

Reputation: 8655

From what I see there is no need in "Unknown" group, so you can easily get all real assignments with lead():

WITH SEQ1
     AS (SELECT
            "UNIQ_ID",
            assign_dt,
            least(end_dt, lead(assign_dt-1,1,end_dt)over(partition by uniq_id order by assign_dt)) as end_dt,
            grp
           FROM EXAMPLE_1)
  SELECT *
    FROM SEQ1
ORDER BY 1, 2;

And your original query doesn't work as you described, please try this sample data:

begin
INSERT INTO EXAMPLE_1 VALUES (888, TO_DATE('01-MAR-20'), TO_DATE('10-MAR-20'), TO_DATE('03-MAR-20'), 'Red');
INSERT INTO EXAMPLE_1 VALUES (888, TO_DATE('01-MAR-20'), TO_DATE('10-MAR-20'), TO_DATE('07-MAR-20'), 'Yellow');
INSERT INTO EXAMPLE_1 VALUES (999, TO_DATE('15-MAR-20'), TO_DATE('20-MAR-20'), TO_DATE('15-MAR-20'), 'Yellow');
-- extra:
INSERT INTO EXAMPLE_1 VALUES (111, TO_DATE('01-JUN-20'), TO_DATE('10-JUN-20'), TO_DATE('05-JUN-20'), 'Yellow');
INSERT INTO EXAMPLE_1 VALUES (111, TO_DATE('15-JUN-20'), TO_DATE('20-JUN-20'), TO_DATE('17-JUN-20'), 'Red');
INSERT INTO EXAMPLE_1 VALUES (111, TO_DATE('15-JUN-20'), TO_DATE('20-JUN-20'), TO_DATE('19-JUN-20'), 'Green');
INSERT INTO EXAMPLE_1 VALUES (111, TO_DATE('26-JUN-20'), TO_DATE('30-JUN-20'), TO_DATE('27-JUN-20'), 'Yellow');
INSERT INTO EXAMPLE_1 VALUES (111, TO_DATE('25-JUN-20'), TO_DATE('29-JUN-20'), TO_DATE('28-JUN-20'), 'Black');
commit;
end;
/

And describe what exactly do you want to see for 111

Upvotes: 1

Related Questions