Vitor Mira
Vitor Mira

Reputation: 93

Merge Datetime Ranges Oracle SQL or PL/SQL

I have been struggling to merge datetime ranges in oracle SQL or PL/SQL (Database Standard Edition 11gR2).

I am trying to merge datetime ranges so that the following data

order_id    start_date_time         end_date_time
3933        04/02/2020 08:00:00     04/02/2020 12:00:00
3933        04/02/2020 13:30:00     04/02/2020 17:00:00
3933        04/02/2020 14:00:00     04/02/2020 19:00:00
3933        05/02/2020 13:40:12     05/02/2020 14:34:48
3933        05/02/2020 14:00:00     05/02/2020 18:55:12
3933        05/02/2020 14:49:48     05/02/2020 15:04:48
3933        06/02/2020 08:00:00     06/02/2020 12:00:00
3933        06/02/2020 13:30:00     06/02/2020 17:00:00
3933        06/02/2020 14:10:12     06/02/2020 18:49:48
3933        07/02/2020 08:00:00     07/02/2020 10:30:00
3933        07/02/2020 08:00:00     07/02/2020 12:00:00
3933        07/02/2020 13:30:00     07/02/2020 17:00:00
11919       14/05/2020 09:00:00     14/05/2020 17:00:00
11919       14/05/2020 09:00:00     14/05/2020 17:00:00
11919       14/05/2020 15:00:00     14/05/2020 16:30:00
11919       15/05/2020 08:40:12     15/05/2020 16:30:00
11919       15/05/2020 09:40:12     15/05/2020 16:30:00
11919       15/05/2020 10:15:00     15/05/2020 12:15:00
11919       15/05/2020 13:19:48     15/05/2020 16:00:00
11919       18/05/2020 08:49:48     18/05/2020 09:45:00
11919       18/05/2020 10:00:00     18/05/2020 17:00:00
11919       18/05/2020 10:00:00     18/05/2020 16:58:12
11919       18/05/2020 15:34:48     18/05/2020 16:10:12
11919       18/05/2020 16:30:00     18/05/2020 16:45:00
...         ...                     ...

would transform into the following result set

--after merge (this is the result I am seeking)
order_id    start_date_time         end_date_time
3933        04/02/2020 08:00:00     04/02/2020 12:00:00
3933        04/02/2020 13:30:00     04/02/2020 19:00:00
3933        05/02/2020 13:40:12     05/02/2020 18:55:12
3933        06/02/2020 08:00:00     06/02/2020 12:00:00
3933        06/02/2020 13:30:00     06/02/2020 18:49:48
3933        07/02/2020 08:00:00     07/02/2020 12:00:00
3933        07/02/2020 13:30:00     07/02/2020 17:00:00
11919       14/05/2020 09:00:00     14/05/2020 17:00:00
11919       15/05/2020 08:40:12     15/05/2020 16:30:00
11919       18/05/2020 08:49:48     18/05/2020 17:00:00
...         ...                     ...

The format of start_date_time and end_date_time is DAY/MONTH/YEAR HH24:MI:SS.

Any suggestion/solution on how to make that merge in Oracle SQL or PL/SQL?

I thought that was a trivial problem, however I was not able to find a solution on the internet yet.

Thanks in advance.

Upvotes: 1

Views: 1227

Answers (3)

Marcy
Marcy

Reputation: 1

This is how I solved. Imagine an ORDERS table with ORDERID, DATE_START and DATE_END. The innermost query (A) gets the previous end date, the second innermost subquery (B) detects overlaps (or better nooverlaps), the third sums nooverlap to create groups (GID). Finally the outermost query aggregates these groups to create the final intervals for each ORDERID.

SELECT C.ORDERID, MIN(C.DATE_START ) DATE_START , MAX(C.DATE_END) DATE_END
FROM
(    
SELECT B.*,SUM(B.NOOVERLAP) over(PARTITION BY B.ORDERID order by B.DATE_START ) as GID
    FROM
    (
    SELECT  A.*,
            CASE WHEN A.PREV_DATE_END >= A.DATE_START THEN 0 ELSE 1 END NOOVERLAP
    FROM
    (
        SELECT  ORDERID,DATE_START ,DATE_END, 
                LAG(DATE_END) OVER(PARTITION BY ORDERID ORDER BY DATE_START ) PREV_DATE_END 
        FROM    ORDERS
    ) A
    ) B
) C
GROUP BY C.ORDERID , C.GID

Upvotes: 0

MT0
MT0

Reputation: 167982

This is adapted from this answer which contains an explanation of the code. All that has changed is to add PARTITION BY order_id to calculate the date ranges for each order_id and then to return the ranges (rather than total the values, as per the linked answer):

SELECT order_id,
       start_date_time,
       end_date_time
FROM   (
  SELECT order_id,
         LAG( dt ) OVER ( PARTITION BY order_id ORDER BY dt ) AS start_date_time,
         dt AS end_date_time,
         start_end
  FROM   (
    SELECT order_id,
           dt,
           CASE SUM( value ) OVER ( PARTITION BY order_id ORDER BY dt ASC, value DESC, ROWNUM ) * value
             WHEN 1 THEN 'start'
             WHEN 0 THEN 'end'
           END AS start_end
    FROM   table_name
    UNPIVOT ( dt FOR value IN ( start_date_time AS 1, end_date_time AS -1 ) )
  )
  WHERE start_end IS NOT NULL
)
WHERE  start_end = 'end';

From Oracle 12, you can use MATCH_RECONIZE to do row-by-row processing:

SELECT *
FROM   table_name
MATCH_RECOGNIZE(
  PARTITION BY order_id
  ORDER     BY start_date_time
  MEASURES
    FIRST(start_date_time) AS start_date_time,
    MAX(end_date_time)     AS end_date_time
  ONE ROW PER MATCH
  PATTERN (overlapping_rows* last_row)
  DEFINE
    overlapping_rows AS NEXT(start_date_time) <= MAX(end_date_time)
)

Which, for your test data:

CREATE TABLE table_name (
  order_id NUMBER,
  start_date_time DATE,
  end_date_time DATE
);

INSERT INTO table_name ( order_id, start_date_time, end_date_time )
SELECT 3933, TIMESTAMP '2020-02-04 08:00:00', TIMESTAMP '2020-02-04 12:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-04 13:30:00', TIMESTAMP '2020-02-04 17:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-04 14:00:00', TIMESTAMP '2020-02-04 19:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-05 13:40:12', TIMESTAMP '2020-02-05 14:34:48' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-05 14:00:00', TIMESTAMP '2020-02-05 18:55:12' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-05 14:49:48', TIMESTAMP '2020-02-05 15:04:48' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-06 08:00:00', TIMESTAMP '2020-02-06 12:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-06 13:30:00', TIMESTAMP '2020-02-06 17:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-06 14:10:12', TIMESTAMP '2020-02-06 18:49:48' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-07 08:00:00', TIMESTAMP '2020-02-07 10:30:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-07 08:00:00', TIMESTAMP '2020-02-07 12:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-07 13:30:00', TIMESTAMP '2020-02-07 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-14 09:00:00', TIMESTAMP '2020-05-14 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-14 09:00:00', TIMESTAMP '2020-05-14 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-14 15:00:00', TIMESTAMP '2020-05-14 16:30:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 08:40:12', TIMESTAMP '2020-05-15 16:30:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 09:40:12', TIMESTAMP '2020-05-15 16:30:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 10:15:00', TIMESTAMP '2020-05-15 12:15:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 13:19:48', TIMESTAMP '2020-05-15 16:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 08:49:48', TIMESTAMP '2020-05-18 09:45:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 10:00:00', TIMESTAMP '2020-05-18 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 10:00:00', TIMESTAMP '2020-05-18 16:58:12' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 15:34:48', TIMESTAMP '2020-05-18 16:10:12' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 16:30:00', TIMESTAMP '2020-05-18 16:45:00' FROM DUAL;

Which both output:

ORDER_ID | START_DATE_TIME     | END_DATE_TIME      
-------: | :------------------ | :------------------
    3933 | 2020-02-04 08:00:00 | 2020-02-04 12:00:00
    3933 | 2020-02-04 13:30:00 | 2020-02-04 19:00:00
    3933 | 2020-02-05 13:40:12 | 2020-02-05 18:55:12
    3933 | 2020-02-06 08:00:00 | 2020-02-06 12:00:00
    3933 | 2020-02-06 13:30:00 | 2020-02-06 18:49:48
    3933 | 2020-02-07 08:00:00 | 2020-02-07 12:00:00
    3933 | 2020-02-07 13:30:00 | 2020-02-07 17:00:00
   11919 | 2020-05-14 09:00:00 | 2020-05-14 17:00:00
   11919 | 2020-05-15 08:40:12 | 2020-05-15 16:30:00
   11919 | 2020-05-18 08:49:48 | 2020-05-18 09:45:00
   11919 | 2020-05-18 10:00:00 | 2020-05-18 17:00:00

db<>fiddle here

Upvotes: 2

user5683823
user5683823

Reputation:

The solution below uses a common method known as the "start of group" method.

The idea is to order the intervals by start date (separately for each id), and to assign intervals to groups as follows. For each interval, check if its start time is strictly greater than the MAX of end times of all preceding intervals. If it is, that starts a new group. The rest is easy - just select the MIN start date and the MAX end date from each group.

Here is how this is implemented, using analytic functions:

with
  has_sog_flags (order_id, start_date_time, end_date_time, flag) as (
    select order_id, start_date_time, end_date_time,
           case when start_date_time > 
                      max(end_date_time) over (partition by order_id
                                    order by start_date_time
                    rows between unbounded preceding and 1 preceding) 
                then 1 end
    from   table_name
  )
, has_groups (order_id, start_date_time, end_date_time, grp) as (
    select order_id, start_date_time, end_date_time,
           sum(flag) over (partition by order_id order by start_date_time)
    from   has_sog_flags
  )
select order_id, min(start_date_time) as start_date_time, 
       max(end_date_time) as end_date_time
from   has_groups
group  by order_id, grp
order  by order_id, start_date_time
;

An interesting question is how to handle open-ended intervals (where for example null for end_date_time means "open ended into the future". The query can be adapted relatively easily to cover such extensions to the problem statement.

Upvotes: 0

Related Questions