Reputation: 55584
I've got a table which contains start-times (using number in example to keep it simple), and durations of events.
I would like to identify "blocks" and their start- and the end-time.
Whenever the difference between the end-time (start-time + duration) of the previous row (sorted by start-time) and the start-time of the current row is >=5
, a new "block" should begin.
This is my test-data, including an attempt of a graphical explanation in the comments:
WITH test_data AS (
SELECT 0 s, 2 dur FROM dual UNION ALL --# ■■
SELECT 2 , 2 FROM dual UNION ALL --# ■■
SELECT 10 , 1 FROM dual UNION ALL --# ■
SELECT 13 , 4 FROM dual UNION ALL --# ■■■■
SELECT 15 , 4 FROM dual --# ■■■■
)
--# Should return
--# 0 .. 4 --# ■■■■
--# 10 .. 19 --# ■■■■■■■■■
The first block starts at 0
and ends at 4
. Since the difference to the next row is >=5
, start another block at 10
which ends at 19
.
I can identify the first row of a block, using LAG
, but I have not yet found out how to proceed.
And I could solve the problem in a PL/SQL-loop, but I'm trying to avoid that for performance reasons.
Any suggestions on how to write this query?
Thanks in advance, Peter
Upvotes: 6
Views: 428
Reputation: 67752
I use subqueries with analytics to identify and group contiguous ranges:
SQL> WITH test_data AS (
2 SELECT 0 s, 2 dur FROM dual UNION ALL --# ■■
3 SELECT 2 , 2 FROM dual UNION ALL --# ■■
4 SELECT 10 , 1 FROM dual UNION ALL --# ■
5 SELECT 13 , 4 FROM dual UNION ALL --# ■■■■
6 SELECT 15 , 4 FROM dual --# ■■■■
7 )
8 SELECT MIN(s) "begin", MAX(s + dur) "end"
9 FROM (SELECT s, dur, SUM(gap) over(ORDER BY s) my_group
10 FROM (SELECT s, dur,
11 CASE
12 WHEN lag(s + dur) over(ORDER BY s) >= s - 5 THEN
13 0
14 ELSE
15 1
16 END gap
17 FROM test_data
18 ORDER BY s))
19 GROUP BY my_group;
begin end
---------- ----------
0 4
10 19
Upvotes: 4
Reputation: 2895
The code gets a bit complicated with a number of subqueries, etc. The may be instances of data where this doesn't work but I can't think of any off the top of my head.
Working with temporal data is always a pain!
WITH test_data AS (
SELECT 0 s, 2 dur FROM dual UNION ALL --# ■■
SELECT 2 , 2 FROM dual UNION ALL --# ■■
SELECT 10 , 1 FROM dual UNION ALL --# ■
SELECT 13 , 4 FROM dual UNION ALL --# ■■■■
SELECT 15 , 4 FROM dual --# ■■■■
)
select
-- Group on each block
min(start_time) as s,
max(end_time) - min(start_time) as dur
from (
select
start_time,
duration,
end_time,
-- number the blocks sequentially
sum(is_block_start) over (order by start_time) as block_num
from (
select
start_time,
duration,
end_time,
-- Mark the start of each block
case
when nvl2(prev_end_time, start_time - prev_end_time,5) >= 5
then 1 else 0 end as is_block_start
from (
select
s as start_time,
dur as duration,
s+dur as end_time,
lag(s+dur) over (order by s) prev_end_time
from test_data
)
)
)
group by block_num
Upvotes: 2
Reputation: 10337
There is a fantastic book by Richard Snodgrass which may help: Developing Time-Oriented Database Applications in SQL (free to download) which I have found invaluable when dealing with time in databases.
Have a look on Richards page for links to some book corrections and the associated CD-ROM in zip format.
Upvotes: 1
Reputation: 7420
In MS-SQL I would use ROW_NUMBER() OVER(ORDER BY starttime) AS Rank
to rank the rows on start time.
Then, I would write a query to join each line to the line with previous Rank and set a flag if the difference is bigger than five or NULL (first row).
Then, I would select all rows having this flag which are start rows, and for this subset repeat the process of numbering rows and joining to the next row to get the time spans:
blockstarttime1 nextstarttime1 (=starttime2)
blockstarttime2 nextstarttime2 (=starttime3)
blockstarttime3 NULL
Finally, this dataset can be joined to the original data with a WHERE starttime BETWEEN blockstarttime and nextstarttime
to partition the results.
Up to you to translate this to Oracle...
Upvotes: 1