Reputation: 63
I have a series of possibly overlapping numbered time intervals. Important: No two intervals begin at the same time, start time of intervals is strictly insreasing.
Illustration:
Task 1: 1111111
Task 2: 22222222222
Task 3: 333333333333333
Task 4: 444444
Task 5: 5555555
Task 6: 66
.
.
.
0 --- time axis --->
The intervals represent tasks that should be performed. I am looking for a SQL query that selects tasks that can be performed, given the constraint that only one task can be performed at the same time. The first task is always performed. Next, from all tasks that begin after the first task finishes, the task that starts at the earliest time is performed. And so on.
Result: tasks 1, 3 and 6 can be performed. Illustration:
Task 1: 1111111 (yes, first)
Task 2: ----------- (no, task 1 is running when 2 begins)
Task 3: 333333333333333 (yes)
Task 4: ------ (no, task 3 is running when 4 begins)
Task 5: ------- (no, task 3 is running when 5 begins)
Task 6: 66 (yes)
.
.
.
0 --- time axis --->
Using iteration, the algorithm is easy: in one loop iterate over intervals in ascending order remembering the end of the last selected interval. However, I would like to ask you for a SQL query, possibly using window functions, that can be performed eg. on Google BigQuery.
Schema of the tasks table:
task_id: integer,
start_timestamp: integer,
duration_seconds: integer.
Sample data:
task_id,start_timestamp,duration_seconds
1,1,7
2,4,11
3,12,15
4,16,6
5,24,7
6,33,2
7,37,4
8,42,13
9,47,3
10,50,2
11,54,21
12,58,14
13,66,8
14,72,7
15,80,6
16,88,16
17,92,14
18,102,3
19,109,2
20,119,10
21,123,13
22,128,21
23,138,7
24,141,17
25,146,9
26,154,17
27,160,17
28,164,13
29,173,21
30,181,7
Result - selected tasks:
1,3,6,7,8,12,14,15,16,19,20,23,25,27,30
Illustration of sample data:
Task 1: 1111111 Task 2: 22222222222 Task 3: 333333333333333 Task 4: 444444 Task 5: 5555555 Task 6: 66 Task 7: 7777 Task 8: 8888888888888 Task 9: 999 Task 10: 10 Task 11: 11xxxxxxxxxxxxxxxxxxx Task 12: 12xxxxxxxxxxxx Task 13: 13xxxxxx Task 14: 14xxxxx Task 15: 15xxxx Task 16: 16xxxxxxxxxxxxxx Task 17: 17xxxxxxxxxxxx Task 18: 18x Task 19: 19 Task 20: 20xxxxxxxx Task 21: 21xxxxxxxxxxx Task 22: 22xxxxxxxxxxxxxxxxxxx Task 23: 23xxxxx Task 24: 24xxxxxxxxxxxxxxx Task 25: 25xxxxxxx Task 26: 26xxxxxxxxxxxxxxx Task 27: 27xxxxxxxxxxxxxxx Task 28: 28xxxxxxxxxxx Task 29: 29xxxxxxxxxxxxxxxxxxx Task 30: 30xxxxx
Thank you very much for any help.
Upvotes: 2
Views: 814
Reputation: 1486
Just wrote something very related (in oracle), maybe it can help someone. I found my solution on http://technology.amis.nl/2007/05/07/creating-a-gantt-chart-in-sql/
Here is a peice of code helping to illustrate the gantt schema.
WITH
PERIODS AS
( SELECT TASK_ID LABEL ,
START_TIMESTAMP START_DATE ,
START_TIMESTAMP+DURATION_SECONDS END_DATE
FROM testet
),
LIMITS AS -- determine the earliest starting date and the latest end date to determine the overall width of the chart
( SELECT MIN(START_DATE) PERIOD_START ,
MAX(END_DATE) PERIOD_END ,
80 WIDTH -- set the width as the number of characters
FROM PERIODS
),
BARS AS
( SELECT LPAD(LABEL, '20')||'|' ACTIVITY ,
(START_DATE - PERIOD_START)/(PERIOD_END - PERIOD_START) * WIDTH FROM_POS, -- the starting position for the bar
(END_DATE - PERIOD_START)/(PERIOD_END - PERIOD_START) * WIDTH TO_POS -- the end position for the bar
FROM PERIODS , LIMITS
)
SELECT ACTIVITY||
LPAD('I',FROM_POS) ||
RPAD('-', TO_POS - FROM_POS, '-') ||
'I' GANTT
FROM BARS
UNION ALL
SELECT RPAD('_',WIDTH + 22,'_')
FROM LIMITS
UNION ALL
SELECT LPAD('|',21) ||
PERIOD_START ||
LPAD(PERIOD_END,
WIDTH - 11)
FROM LIMITS;
the output:
1|--I
2|I----I
3| I------I
4| I--I
5| I--I
6| II
7| I-I
8| I-----I
9| I-I
10| II
11| I--------I
12| I-----I
13| I---I
14| I--I
15| I--I
16| I------I
17| I-----I
18| I-I
19| II
20| I----I
21| I-----I
22| I--------I
23| I--I
24| I-------I
25| I---I
26| I-------I
27| I-------I
28| I-----I
29| I--------I
30| I--I
______________________________________________________________________________________________________
|1 194
Upvotes: 0
Reputation: 3172
One way to do this is as follows: Find overlapping tasks (start time is between start time and end time of other task), and than extract all other tasks.
Select task_id
FROM [table]
where Task_id not in(
Select B.task_id FROM
(SELECT task_id, start_timestamp, duration_seconds ,start_timestamp+duration_seconds as end_timestamp
FROM [table] ) as A
CROSS JOIN EACH
(SELECT task_id, start_timestamp, duration_seconds ,start_timestamp+duration_seconds as end_timestamp
FROM [table] ) as B
where B.start_timestamp>=A.start_timestamp
and B.start_timestamp<A.end_timestamp
and A.task_id<>b.task_id)
This solution is not using window functions.
With window functions its feasible but you have to assume a limit of concurrent parallel jobs (in this example 3). here i am using the LAG window function to find the 3 predecessor tasks and check if a certain task overlaps one of them (start time is between start time and end time of prev starting task)
Select task_id
FROM
(Select task_id, start_timestamp, duration_seconds ,end_timestamp
,LAG(task_id,1) OVER (ORDER BY start_timestamp) as LAG_task_id_1
,LAG(start_timestamp,1) OVER (ORDER BY start_timestamp) as LAG_start_timestamp_1
,LAG(duration_seconds,1) OVER (ORDER BY start_timestamp) as LAG_duration_seconds_1
,LAG(end_timestamp,1) OVER (ORDER BY start_timestamp) as LAG_end_timestamp_1
,LAG(task_id,2) OVER (ORDER BY start_timestamp) as LAG_task_id_2
,LAG(start_timestamp,2) OVER (ORDER BY start_timestamp) as LAG_start_timestamp_2
,LAG(duration_seconds,21) OVER (ORDER BY start_timestamp) as LAG_duration_seconds_2
,LAG(end_timestamp,2) OVER (ORDER BY start_timestamp) as LAG_end_timestamp_2
,LAG(task_id,3) OVER (ORDER BY start_timestamp) as LAG_task_id_3
,LAG(start_timestamp,3) OVER (ORDER BY start_timestamp) as LAG_start_timestamp_3
,LAG(duration_seconds,3) OVER (ORDER BY start_timestamp) as LAG_duration_seconds_3
,LAG(end_timestamp,3) OVER (ORDER BY start_timestamp) as LAG_end_timestamp_3
FROM
(SELECT task_id, start_timestamp, duration_seconds ,start_timestamp+duration_seconds as end_timestamp
FROM [table] ))
where
(NOT(start_timestamp>=LAG_start_timestamp_1 and start_timestamp<LAG_end_timestamp_1)
and NOT(start_timestamp>=LAG_start_timestamp_2 and start_timestamp<LAG_end_timestamp_2)
and NOT(start_timestamp>=LAG_start_timestamp_3 and start_timestamp<LAG_end_timestamp_3))
OR LAG_start_timestamp_1 IS NULL
Hope this helps...
Upvotes: 2