Nathan
Nathan

Reputation: 63

SQL query to find a subseries of non overlapping intervals within a series of overlapping (time) intervals

I have a series of possibly overlapping numbered time intervals. Important: No two intervals begin at the same time, start time of intervals is strictly insreasing.

Illustration:

Task 1:  1111111
Task 2:     22222222222    
Task 3:             333333333333333
Task 4:                 444444
Task 5:                         5555555
Task 6:                                  66
   .
   .
   .
        0 --- time axis --->

The intervals represent tasks that should be performed. I am looking for a SQL query that selects tasks that can be performed, given the constraint that only one task can be performed at the same time. The first task is always performed. Next, from all tasks that begin after the first task finishes, the task that starts at the earliest time is performed. And so on.

Result: tasks 1, 3 and 6 can be performed. Illustration:

Task 1:  1111111                             (yes, first)
Task 2:     -----------                      (no, task 1 is running when 2 begins)
Task 3:             333333333333333          (yes)
Task 4:                 ------               (no, task 3 is running when 4 begins)
Task 5:                         -------      (no, task 3 is running when 5 begins)
Task 6:                                  66  (yes)
   .
   .
   .
        0 --- time axis --->

Using iteration, the algorithm is easy: in one loop iterate over intervals in ascending order remembering the end of the last selected interval. However, I would like to ask you for a SQL query, possibly using window functions, that can be performed eg. on Google BigQuery.

Schema of the tasks table:

task_id: integer,
start_timestamp: integer,
duration_seconds: integer.

Sample data:

task_id,start_timestamp,duration_seconds 
1,1,7
2,4,11
3,12,15
4,16,6
5,24,7
6,33,2
7,37,4
8,42,13
9,47,3
10,50,2
11,54,21
12,58,14
13,66,8
14,72,7
15,80,6
16,88,16
17,92,14
18,102,3
19,109,2
20,119,10
21,123,13
22,128,21
23,138,7
24,141,17
25,146,9
26,154,17
27,160,17
28,164,13
29,173,21
30,181,7

Result - selected tasks:

1,3,6,7,8,12,14,15,16,19,20,23,25,27,30

Illustration of sample data:

Task  1:  1111111
Task  2:     22222222222
Task  3:             333333333333333
Task  4:                 444444
Task  5:                         5555555
Task  6:                                  66
Task  7:                                      7777
Task  8:                                           8888888888888
Task  9:                                                999
Task 10:                                                   10
Task 11:                                                       11xxxxxxxxxxxxxxxxxxx
Task 12:                                                           12xxxxxxxxxxxx
Task 13:                                                                   13xxxxxx
Task 14:                                                                         14xxxxx
Task 15:                                                                                 15xxxx
Task 16:                                                                                         16xxxxxxxxxxxxxx
Task 17:                                                                                             17xxxxxxxxxxxx
Task 18:                                                                                                       18x
Task 19:                                                                                                              19
Task 20:                                                                                                                        20xxxxxxxx
Task 21:                                                                                                                            21xxxxxxxxxxx
Task 22:                                                                                                                                 22xxxxxxxxxxxxxxxxxxx
Task 23:                                                                                                                                           23xxxxx
Task 24:                                                                                                                                              24xxxxxxxxxxxxxxx
Task 25:                                                                                                                                                   25xxxxxxx
Task 26:                                                                                                                                                           26xxxxxxxxxxxxxxx
Task 27:                                                                                                                                                                 27xxxxxxxxxxxxxxx
Task 28:                                                                                                                                                                     28xxxxxxxxxxx
Task 29:                                                                                                                                                                              29xxxxxxxxxxxxxxxxxxx
Task 30:                                                                                                                                                                                      30xxxxx

Thank you very much for any help.

Upvotes: 2

Views: 814

Answers (2)

W_O_L_F
W_O_L_F

Reputation: 1486

Just wrote something very related (in oracle), maybe it can help someone. I found my solution on http://technology.amis.nl/2007/05/07/creating-a-gantt-chart-in-sql/

Here is a peice of code helping to illustrate the gantt schema.

WITH 
PERIODS AS
  ( SELECT  TASK_ID LABEL  ,      
            START_TIMESTAMP START_DATE  ,      
            START_TIMESTAMP+DURATION_SECONDS END_DATE  
    FROM   testet
  ), 
LIMITS AS -- determine the earliest starting date and the latest end date to determine the overall width of the chart
  ( SELECT  MIN(START_DATE) PERIOD_START  ,      
            MAX(END_DATE) PERIOD_END  ,      
            80 WIDTH -- set the width as the number of characters  
    FROM   PERIODS
  ), 
BARS AS
  ( SELECT   LPAD(LABEL, '20')||'|' ACTIVITY  ,        
            (START_DATE - PERIOD_START)/(PERIOD_END - PERIOD_START) * WIDTH FROM_POS, -- the starting position for the bar          
            (END_DATE - PERIOD_START)/(PERIOD_END - PERIOD_START)   * WIDTH TO_POS   -- the end position for the bar  
    FROM     PERIODS  ,        LIMITS
  ) 
SELECT  ACTIVITY||        
        LPAD('I',FROM_POS)         ||
        RPAD('-', TO_POS - FROM_POS, '-')         ||
        'I' GANTT
FROM     BARS
UNION ALL
SELECT RPAD('_',WIDTH + 22,'_')
FROM   LIMITS
UNION ALL
SELECT  LPAD('|',21)       ||
        PERIOD_START       ||
        LPAD(PERIOD_END, 
        WIDTH - 11)
FROM   LIMITS;

the output:

                   1|--I
                   2|I----I
                   3|   I------I
                   4|     I--I
                   5|        I--I
                   6|            II
                   7|             I-I
                   8|               I-----I
                   9|                  I-I
                  10|                   II
                  11|                    I--------I
                  12|                      I-----I
                  13|                         I---I
                  14|                            I--I
                  15|                               I--I
                  16|                                   I------I
                  17|                                    I-----I
                  18|                                        I-I
                  19|                                           II
                  20|                                               I----I
                  21|                                                 I-----I
                  22|                                                   I--------I
                  23|                                                       I--I
                  24|                                                         I-------I
                  25|                                                           I---I
                  26|                                                              I-------I
                  27|                                                                I-------I
                  28|                                                                  I-----I
                  29|                                                                      I--------I
                  30|                                                                         I--I
______________________________________________________________________________________________________
                    |1                                                                  194

Upvotes: 0

N.N.
N.N.

Reputation: 3172

One way to do this is as follows: Find overlapping tasks (start time is between start time and end time of other task), and than extract all other tasks.

Select task_id
FROM [table]
where Task_id not in(    
    Select B.task_id FROM
    (SELECT task_id, start_timestamp, duration_seconds ,start_timestamp+duration_seconds as end_timestamp
    FROM [table] ) as A
    CROSS JOIN EACH
    (SELECT task_id, start_timestamp, duration_seconds ,start_timestamp+duration_seconds as end_timestamp
    FROM [table] ) as B
    where B.start_timestamp>=A.start_timestamp
    and B.start_timestamp<A.end_timestamp
    and A.task_id<>b.task_id)

This solution is not using window functions.

With window functions its feasible but you have to assume a limit of concurrent parallel jobs (in this example 3). here i am using the LAG window function to find the 3 predecessor tasks and check if a certain task overlaps one of them (start time is between start time and end time of prev starting task)

Select task_id
FROM
(Select task_id, start_timestamp, duration_seconds ,end_timestamp
,LAG(task_id,1) OVER (ORDER BY start_timestamp) as LAG_task_id_1
,LAG(start_timestamp,1) OVER (ORDER BY start_timestamp) as LAG_start_timestamp_1
,LAG(duration_seconds,1) OVER (ORDER BY start_timestamp) as LAG_duration_seconds_1
,LAG(end_timestamp,1) OVER (ORDER BY start_timestamp) as LAG_end_timestamp_1
,LAG(task_id,2) OVER (ORDER BY start_timestamp) as LAG_task_id_2
,LAG(start_timestamp,2) OVER (ORDER BY start_timestamp) as LAG_start_timestamp_2
,LAG(duration_seconds,21) OVER (ORDER BY start_timestamp) as LAG_duration_seconds_2
,LAG(end_timestamp,2) OVER (ORDER BY start_timestamp) as LAG_end_timestamp_2
,LAG(task_id,3) OVER (ORDER BY start_timestamp) as LAG_task_id_3
,LAG(start_timestamp,3) OVER (ORDER BY start_timestamp) as LAG_start_timestamp_3
,LAG(duration_seconds,3) OVER (ORDER BY start_timestamp) as LAG_duration_seconds_3
,LAG(end_timestamp,3) OVER (ORDER BY start_timestamp) as LAG_end_timestamp_3
FROM
(SELECT task_id, start_timestamp, duration_seconds ,start_timestamp+duration_seconds as end_timestamp
FROM [table] ))
where 
(NOT(start_timestamp>=LAG_start_timestamp_1 and start_timestamp<LAG_end_timestamp_1)
and NOT(start_timestamp>=LAG_start_timestamp_2 and start_timestamp<LAG_end_timestamp_2)
and NOT(start_timestamp>=LAG_start_timestamp_3 and start_timestamp<LAG_end_timestamp_3))
OR LAG_start_timestamp_1 IS NULL

Hope this helps...

Upvotes: 2

Related Questions