Reputation: 93
I have been struggling to merge datetime ranges in oracle SQL or PL/SQL (Database Standard Edition 11gR2).
I am trying to merge datetime ranges so that the following data
order_id start_date_time end_date_time
3933 04/02/2020 08:00:00 04/02/2020 12:00:00
3933 04/02/2020 13:30:00 04/02/2020 17:00:00
3933 04/02/2020 14:00:00 04/02/2020 19:00:00
3933 05/02/2020 13:40:12 05/02/2020 14:34:48
3933 05/02/2020 14:00:00 05/02/2020 18:55:12
3933 05/02/2020 14:49:48 05/02/2020 15:04:48
3933 06/02/2020 08:00:00 06/02/2020 12:00:00
3933 06/02/2020 13:30:00 06/02/2020 17:00:00
3933 06/02/2020 14:10:12 06/02/2020 18:49:48
3933 07/02/2020 08:00:00 07/02/2020 10:30:00
3933 07/02/2020 08:00:00 07/02/2020 12:00:00
3933 07/02/2020 13:30:00 07/02/2020 17:00:00
11919 14/05/2020 09:00:00 14/05/2020 17:00:00
11919 14/05/2020 09:00:00 14/05/2020 17:00:00
11919 14/05/2020 15:00:00 14/05/2020 16:30:00
11919 15/05/2020 08:40:12 15/05/2020 16:30:00
11919 15/05/2020 09:40:12 15/05/2020 16:30:00
11919 15/05/2020 10:15:00 15/05/2020 12:15:00
11919 15/05/2020 13:19:48 15/05/2020 16:00:00
11919 18/05/2020 08:49:48 18/05/2020 09:45:00
11919 18/05/2020 10:00:00 18/05/2020 17:00:00
11919 18/05/2020 10:00:00 18/05/2020 16:58:12
11919 18/05/2020 15:34:48 18/05/2020 16:10:12
11919 18/05/2020 16:30:00 18/05/2020 16:45:00
... ... ...
would transform into the following result set
--after merge (this is the result I am seeking)
order_id start_date_time end_date_time
3933 04/02/2020 08:00:00 04/02/2020 12:00:00
3933 04/02/2020 13:30:00 04/02/2020 19:00:00
3933 05/02/2020 13:40:12 05/02/2020 18:55:12
3933 06/02/2020 08:00:00 06/02/2020 12:00:00
3933 06/02/2020 13:30:00 06/02/2020 18:49:48
3933 07/02/2020 08:00:00 07/02/2020 12:00:00
3933 07/02/2020 13:30:00 07/02/2020 17:00:00
11919 14/05/2020 09:00:00 14/05/2020 17:00:00
11919 15/05/2020 08:40:12 15/05/2020 16:30:00
11919 18/05/2020 08:49:48 18/05/2020 17:00:00
... ... ...
The format of start_date_time and end_date_time is DAY/MONTH/YEAR HH24:MI:SS.
Any suggestion/solution on how to make that merge in Oracle SQL or PL/SQL?
I thought that was a trivial problem, however I was not able to find a solution on the internet yet.
Thanks in advance.
Upvotes: 1
Views: 1227
Reputation: 1
This is how I solved. Imagine an ORDERS
table with ORDERID
, DATE_START
and DATE_END
. The innermost query (A
) gets the previous end date, the second innermost subquery (B
) detects overlaps (or better nooverlaps), the third sums nooverlap to create groups (GID
). Finally the outermost query aggregates these groups to create the final intervals for each ORDERID
.
SELECT C.ORDERID, MIN(C.DATE_START ) DATE_START , MAX(C.DATE_END) DATE_END
FROM
(
SELECT B.*,SUM(B.NOOVERLAP) over(PARTITION BY B.ORDERID order by B.DATE_START ) as GID
FROM
(
SELECT A.*,
CASE WHEN A.PREV_DATE_END >= A.DATE_START THEN 0 ELSE 1 END NOOVERLAP
FROM
(
SELECT ORDERID,DATE_START ,DATE_END,
LAG(DATE_END) OVER(PARTITION BY ORDERID ORDER BY DATE_START ) PREV_DATE_END
FROM ORDERS
) A
) B
) C
GROUP BY C.ORDERID , C.GID
Upvotes: 0
Reputation: 167982
This is adapted from this answer which contains an explanation of the code. All that has changed is to add PARTITION BY order_id
to calculate the date ranges for each order_id
and then to return the ranges (rather than total the values, as per the linked answer):
SELECT order_id,
start_date_time,
end_date_time
FROM (
SELECT order_id,
LAG( dt ) OVER ( PARTITION BY order_id ORDER BY dt ) AS start_date_time,
dt AS end_date_time,
start_end
FROM (
SELECT order_id,
dt,
CASE SUM( value ) OVER ( PARTITION BY order_id ORDER BY dt ASC, value DESC, ROWNUM ) * value
WHEN 1 THEN 'start'
WHEN 0 THEN 'end'
END AS start_end
FROM table_name
UNPIVOT ( dt FOR value IN ( start_date_time AS 1, end_date_time AS -1 ) )
)
WHERE start_end IS NOT NULL
)
WHERE start_end = 'end';
From Oracle 12, you can use MATCH_RECONIZE
to do row-by-row processing:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY order_id
ORDER BY start_date_time
MEASURES
FIRST(start_date_time) AS start_date_time,
MAX(end_date_time) AS end_date_time
ONE ROW PER MATCH
PATTERN (overlapping_rows* last_row)
DEFINE
overlapping_rows AS NEXT(start_date_time) <= MAX(end_date_time)
)
Which, for your test data:
CREATE TABLE table_name (
order_id NUMBER,
start_date_time DATE,
end_date_time DATE
);
INSERT INTO table_name ( order_id, start_date_time, end_date_time )
SELECT 3933, TIMESTAMP '2020-02-04 08:00:00', TIMESTAMP '2020-02-04 12:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-04 13:30:00', TIMESTAMP '2020-02-04 17:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-04 14:00:00', TIMESTAMP '2020-02-04 19:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-05 13:40:12', TIMESTAMP '2020-02-05 14:34:48' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-05 14:00:00', TIMESTAMP '2020-02-05 18:55:12' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-05 14:49:48', TIMESTAMP '2020-02-05 15:04:48' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-06 08:00:00', TIMESTAMP '2020-02-06 12:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-06 13:30:00', TIMESTAMP '2020-02-06 17:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-06 14:10:12', TIMESTAMP '2020-02-06 18:49:48' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-07 08:00:00', TIMESTAMP '2020-02-07 10:30:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-07 08:00:00', TIMESTAMP '2020-02-07 12:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-07 13:30:00', TIMESTAMP '2020-02-07 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-14 09:00:00', TIMESTAMP '2020-05-14 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-14 09:00:00', TIMESTAMP '2020-05-14 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-14 15:00:00', TIMESTAMP '2020-05-14 16:30:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 08:40:12', TIMESTAMP '2020-05-15 16:30:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 09:40:12', TIMESTAMP '2020-05-15 16:30:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 10:15:00', TIMESTAMP '2020-05-15 12:15:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 13:19:48', TIMESTAMP '2020-05-15 16:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 08:49:48', TIMESTAMP '2020-05-18 09:45:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 10:00:00', TIMESTAMP '2020-05-18 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 10:00:00', TIMESTAMP '2020-05-18 16:58:12' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 15:34:48', TIMESTAMP '2020-05-18 16:10:12' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 16:30:00', TIMESTAMP '2020-05-18 16:45:00' FROM DUAL;
Which both output:
ORDER_ID | START_DATE_TIME | END_DATE_TIME -------: | :------------------ | :------------------ 3933 | 2020-02-04 08:00:00 | 2020-02-04 12:00:00 3933 | 2020-02-04 13:30:00 | 2020-02-04 19:00:00 3933 | 2020-02-05 13:40:12 | 2020-02-05 18:55:12 3933 | 2020-02-06 08:00:00 | 2020-02-06 12:00:00 3933 | 2020-02-06 13:30:00 | 2020-02-06 18:49:48 3933 | 2020-02-07 08:00:00 | 2020-02-07 12:00:00 3933 | 2020-02-07 13:30:00 | 2020-02-07 17:00:00 11919 | 2020-05-14 09:00:00 | 2020-05-14 17:00:00 11919 | 2020-05-15 08:40:12 | 2020-05-15 16:30:00 11919 | 2020-05-18 08:49:48 | 2020-05-18 09:45:00 11919 | 2020-05-18 10:00:00 | 2020-05-18 17:00:00
db<>fiddle here
Upvotes: 2
Reputation:
The solution below uses a common method known as the "start of group" method.
The idea is to order the intervals by start date (separately for each id), and to assign intervals to groups as follows. For each interval, check if its start time is strictly greater than the MAX of end times of all preceding intervals. If it is, that starts a new group. The rest is easy - just select the MIN start date and the MAX end date from each group.
Here is how this is implemented, using analytic functions:
with
has_sog_flags (order_id, start_date_time, end_date_time, flag) as (
select order_id, start_date_time, end_date_time,
case when start_date_time >
max(end_date_time) over (partition by order_id
order by start_date_time
rows between unbounded preceding and 1 preceding)
then 1 end
from table_name
)
, has_groups (order_id, start_date_time, end_date_time, grp) as (
select order_id, start_date_time, end_date_time,
sum(flag) over (partition by order_id order by start_date_time)
from has_sog_flags
)
select order_id, min(start_date_time) as start_date_time,
max(end_date_time) as end_date_time
from has_groups
group by order_id, grp
order by order_id, start_date_time
;
An interesting question is how to handle open-ended intervals (where for example null
for end_date_time means "open ended into the future". The query can be adapted relatively easily to cover such extensions to the problem statement.
Upvotes: 0