Reputation: 3113
I have this table
+-----+-----+------------+----------+---+
| Row | id | start_time | end_time | |
+-----+-----+------------+----------+---+
| 1 | foo | 18:00:00 | 22:00:00 | |
| 2 | bar | 19:00:00 | 23:00:00 | |
| 3 | baz | 08:00:00 | 11:00:00 | |
| 4 | qux | 05:30:00 | 07:30:00 | |
+-----+-----+------------+----------+---+
which can be generated with this
WITH TABLE AS (
SELECT "foo" AS id, TIME(18,0,0) AS start_time, TIME(22,0,0) AS end_time
UNION ALL
SELECT "bar", TIME(19,0,0), TIME(23,0,0)
UNION ALL
SELECT "baz", TIME(08,0,0), TIME(11,0,0)
UNION ALL
SELECT "qux", TIME(05,30,0), TIME(07,30,0)
)
SELECT * FROM TABLE
If the 22:00 hour is inside the time interval, the row need to be splitted around the hour itself.
For the given input, the output should be:
+-----+-----+------------+----------+---+
| Row | id | start_time | end_time | |
+-----+-----+------------+----------+---+
| 1 | foo | 18:00:00 | 22:00:00 | |
| 2 | bar | 19:00:00 | 22:00:00 | |
| 3 | bar | 22:00:00 | 23:00:00 | |
| 4 | baz | 08:00:00 | 11:00:00 | |
| 5 | qux | 05:30:00 | 06:00:00 | |
| 6 | qux | 06:00:00 | 07:30:00 | |
+-----+-----+------------+----------+---+
Please note that 22:00 hour is used on both start and end of the splitted row.
Also, the same exact behaviour need to be replicated if the hour is 06:00
Can this behaviour be made using directly BigQuery?
Upvotes: 0
Views: 239
Reputation: 3113
I made a couple of minor adjustements on solution provided from @mikhail-berlyant, and here is the resulting query.
I flagged as EDIT
in the query the adjustments I made
#standardSQL
WITH
`project.dataset.table` AS (
-- across split point
SELECT "bar" AS id, TIME(18,0,0) AS start_time, TIME(23,0,0) AS end_time UNION ALL
-- end at split point
SELECT "foo", TIME(21,45,0), TIME(22,0,0) UNION ALL
-- start from split point
SELECT "fuz", TIME(22,0,0), TIME(23,30,0) UNION ALL
-- across multiple split points
SELECT "qux", TIME(05,45,0), TIME(23,30,0) UNION ALL
-- no split point
SELECT "quz1", TIME(23,30,0), TIME(23,45,0) UNION ALL
-- no split point
SELECT "quz2", TIME(02,0,0), TIME(05,59,0)
),
breaks AS (
SELECT break FROM UNNEST([TIME(6,0,0), TIME(22,0,0)]) break
),
temp AS (
SELECT
id,
start_time,
end_time,
ARRAY_AGG(break ORDER BY break) AS break
FROM `project.dataset.table` CROSS JOIN breaks
WHERE
-- [EDIT] >= instead of > to include rows that starts from split point (`fuz`)
break >= start_time
-- [EDIT] <= instead of < to include rows that ends at split (`foo`)
AND break <= end_time
GROUP BY id, start_time, end_time
)
SELECT
id,
point AS start_time,
IFNULL(next_point, point) AS end_time
FROM (
SELECT
id,
point,
LEAD(point) OVER(PARTITION BY id ORDER BY point) AS next_point
FROM
temp,
UNNEST(ARRAY_CONCAT([start_time], break, [end_time])) aS point
)
WHERE next_point IS NOT NULL
-- [EDIT] Remove data with same value as start_time/end time (generated from rows that starts from split point or ends at split point, foo/fuz)
AND point != next_point
-- [EDIT] temp table does not handle ids that does not cross to any split point, adding them from main table (`quz1` and `quz2`)
UNION ALL SELECT
id,
start_time,
end_time
FROM `project.dataset.table`
WHERE id NOT IN (SELECT id FROM temp)
And here is the result
+-----+------+------------+----------+---+
| Row | id | start_time | end_time | |
+-----+------+------------+----------+---+
| 1 | bar | 18:00:00 | 22:00:00 | |
| 2 | bar | 22:00:00 | 23:00:00 | |
| 3 | foo | 21:45:00 | 22:00:00 | |
| 4 | fuz | 22:00:00 | 23:30:00 | |
| 5 | qux | 05:45:00 | 06:00:00 | |
| 6 | qux | 06:00:00 | 22:00:00 | |
| 7 | qux | 22:00:00 | 23:30:00 | |
| 8 | quz1 | 23:30:00 | 23:45:00 | |
| 9 | quz2 | 02:00:00 | 05:59:00 | |
+-----+------+------------+----------+---+
EDIT
If the same input data are created as table (query with destination table)
-- across split point
SELECT "bar" AS id, TIME(18,0,0) AS start_time, TIME(23,0,0) AS end_time UNION ALL
-- end at split point
SELECT "foo", TIME(21,45,0), TIME(22,0,0) UNION ALL
-- start from split point
SELECT "fuz", TIME(22,0,0), TIME(23,30,0) UNION ALL
-- across multiple split points
SELECT "qux", TIME(05,45,0), TIME(23,30,0) UNION ALL
-- no split point
SELECT "quz1", TIME(23,30,0), TIME(23,45,0) UNION ALL
-- no split point
SELECT "quz2", TIME(02,0,0), TIME(05,59,0)
and then I use the very same table as input for the main query
`project.dataset.table` AS (
SELECT * FROM `myproject.mydataset.test`
),
executing the main query I'm getting the following error
Error: LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
Upvotes: 0
Reputation: 172993
Below is for BigQuery Standard SQL and provides quite generic solution where you can have any number of break points as you need by "enlisting" them in breaks
expressions
#standardSQL
WITH `project.dataset.table` AS (
SELECT "foo" AS id, TIME(18,0,0) AS start_time, TIME(22,0,0) AS end_time UNION ALL
SELECT "bar", TIME(19,0,0), TIME(23,0,0) UNION ALL
SELECT "baz", TIME(08,0,0), TIME(11,0,0) UNION ALL
SELECT "qux", TIME(05,30,0), TIME(07,30,0) UNION ALL
SELECT "xxx", TIME(05,45,0), TIME(23,30,0)
), breaks AS (
SELECT break FROM UNNEST([TIME(6,0,0), TIME(10,0,0), TIME(22,0,0)]) break
), temp AS (
SELECT id, start_time, end_time,
ARRAY_AGG(break ORDER BY break) break
FROM `project.dataset.table` CROSS JOIN breaks
WHERE break > start_time AND break < end_time
GROUP BY id, start_time, end_time
)
SELECT id, point start_time,
IFNULL(next_point, point) end_time
FROM (
SELECT id, point,
LEAD(point) OVER(PARTITION BY id ORDER BY point) next_point
FROM temp, UNNEST(ARRAY_CONCAT([start_time], break, [end_time])) point
)
WHERE NOT next_point IS NULL
-- ORDER BY id, point
in above example I set three break points - 06:00, 10:00 and 22:00
and initial intervals as
Row id start_time end_time
1 bar 19:00:00 23:00:00
2 baz 08:00:00 11:00:00
3 foo 18:00:00 22:00:00
4 qux 05:30:00 07:30:00
5 xxx 05:45:00 23:30:00
and result is:
Row id start_time end_time
1 bar 19:00:00 22:00:00
2 bar 22:00:00 23:00:00
3 baz 08:00:00 10:00:00
4 baz 10:00:00 11:00:00
5 qux 05:30:00 06:00:00
6 qux 06:00:00 07:30:00
7 xxx 05:45:00 06:00:00
8 xxx 06:00:00 10:00:00
9 xxx 10:00:00 22:00:00
10 xxx 22:00:00 23:30:00
Upvotes: 3
Reputation: 172993
I would assume that you can have interval that consists both 06:00 and 22:00 hours so needs to be split on three intervals
Below example is for BigQuery Standard SQL and allows you to easily (but definitely not elegantly) extend approach to some reasonable (2-5) break points by cascading same logic for each break point like it is done below
#standardSQL
CREATE TEMP FUNCTION split_interval(start_time TIME, end_time TIME, break TIME)
RETURNS ARRAY<STRUCT<start_time TIME, end_time TIME>> AS (
IF(break > start_time AND break < end_time,
[STRUCT<start_time TIME, end_time TIME>(start_time, break), (break, end_time)],
[STRUCT<start_time TIME, end_time TIME>(start_time, end_time)]
)
);
WITH `project.dataset.table` AS (
SELECT "foo" AS id, TIME(18,0,0) AS start_time, TIME(22,0,0) AS end_time UNION ALL
SELECT "bar", TIME(19,0,0), TIME(23,0,0) UNION ALL
SELECT "baz", TIME(08,0,0), TIME(11,0,0) UNION ALL
SELECT "qux", TIME(05,30,0), TIME(07,30,0) UNION ALL
SELECT "xxx", TIME(05,45,0), TIME(23,30,0)
)
SELECT id, new_interval.* FROM (
SELECT id, new_interval.* FROM `project.dataset.table`,
UNNEST(split_interval(start_time, end_time, TIME(22,00,0))) new_interval
),
UNNEST(split_interval(start_time, end_time, TIME(06,00,0))) new_interval
-- ORDER BY id, start_time
in above example - original intervals are as below
Row id start_time end_time
1 bar 19:00:00 23:00:00
2 baz 08:00:00 11:00:00
3 foo 18:00:00 22:00:00
4 qux 05:30:00 07:30:00
5 xxx 05:45:00 23:30:00
and result is
Row id start_time end_time
1 bar 19:00:00 22:00:00
2 bar 22:00:00 23:00:00
3 baz 08:00:00 11:00:00
4 foo 18:00:00 22:00:00
5 qux 05:30:00 06:00:00
6 qux 06:00:00 07:30:00
7 xxx 05:45:00 06:00:00
8 xxx 06:00:00 22:00:00
9 xxx 22:00:00 23:30:00
as you can see here - if you have 3rd break point - you just need to add yet another outer select around existing - like below
SELECT id, new_interval.* FROM (
existing selects ...
),
UNNEST(split_interval(start_time, end_time, TIME(08,00,0))) new_interval
it is that part which makes above not elegant enough :o)
Upvotes: 3
Reputation: 14004
Here is one way to do it:
WITH TABLE AS (
SELECT "foo" AS id, TIME(18,0,0) AS start_time, TIME(22,0,0) AS end_time
UNION ALL
SELECT "bar", TIME(19,0,0), TIME(23,0,0)
UNION ALL
SELECT "baz", TIME(08,0,0), TIME(11,0,0)
UNION ALL
SELECT "qux", TIME(05,30,0), TIME(07,30,0)
)
SELECT id, intervals.*
FROM TABLE,
UNNEST(IF(
start_time < TIME '22:00:00' AND end_time > TIME '22:00:00',
[STRUCT<start_time TIME, end_time TIME>(start_time, TIME '22:00:00'),(TIME '22:00:00',end_time)],
[STRUCT<start_time TIME, end_time TIME>(start_time, end_time)])) intervals
Upvotes: 1