Deviling Master
Deviling Master

Reputation: 3113

Split TIME interval around 22:00 time in 2 rows

I have this table

+-----+-----+------------+----------+---+
| Row | id  | start_time | end_time |   |
+-----+-----+------------+----------+---+
|   1 | foo | 18:00:00   | 22:00:00 |   |
|   2 | bar | 19:00:00   | 23:00:00 |   |
|   3 | baz | 08:00:00   | 11:00:00 |   |
|   4 | qux | 05:30:00   | 07:30:00 |   |
+-----+-----+------------+----------+---+

which can be generated with this

WITH TABLE AS (
  SELECT "foo" AS id, TIME(18,0,0) AS start_time, TIME(22,0,0) AS end_time
  UNION ALL
  SELECT "bar", TIME(19,0,0), TIME(23,0,0)
  UNION ALL
  SELECT "baz", TIME(08,0,0), TIME(11,0,0)
  UNION ALL
  SELECT "qux", TIME(05,30,0), TIME(07,30,0)
)

SELECT * FROM TABLE

If the 22:00 hour is inside the time interval, the row need to be splitted around the hour itself.

For the given input, the output should be:

+-----+-----+------------+----------+---+
| Row | id  | start_time | end_time |   |
+-----+-----+------------+----------+---+
|   1 | foo | 18:00:00   | 22:00:00 |   |
|   2 | bar | 19:00:00   | 22:00:00 |   |
|   3 | bar | 22:00:00   | 23:00:00 |   |
|   4 | baz | 08:00:00   | 11:00:00 |   |
|   5 | qux | 05:30:00   | 06:00:00 |   |
|   6 | qux | 06:00:00   | 07:30:00 |   |
+-----+-----+------------+----------+---+

Please note that 22:00 hour is used on both start and end of the splitted row.

Also, the same exact behaviour need to be replicated if the hour is 06:00

Can this behaviour be made using directly BigQuery?

Upvotes: 0

Views: 239

Answers (4)

Deviling Master
Deviling Master

Reputation: 3113

I made a couple of minor adjustements on solution provided from @mikhail-berlyant, and here is the resulting query.

I flagged as EDIT in the query the adjustments I made

#standardSQL
WITH

`project.dataset.table` AS (
  -- across split point
  SELECT "bar" AS id, TIME(18,0,0) AS start_time, TIME(23,0,0) AS end_time UNION ALL
  -- end at split point
  SELECT "foo", TIME(21,45,0), TIME(22,0,0) UNION ALL
  -- start from split point
  SELECT "fuz", TIME(22,0,0), TIME(23,30,0) UNION ALL
  -- across multiple split points
  SELECT "qux", TIME(05,45,0), TIME(23,30,0) UNION ALL
  -- no split point
  SELECT "quz1", TIME(23,30,0), TIME(23,45,0) UNION ALL
  -- no split point
  SELECT "quz2", TIME(02,0,0), TIME(05,59,0)  
),

breaks AS (
  SELECT break FROM UNNEST([TIME(6,0,0), TIME(22,0,0)]) break
),

temp AS (
  SELECT
    id,
    start_time,
    end_time, 
    ARRAY_AGG(break ORDER BY break) AS break
  FROM `project.dataset.table` CROSS JOIN breaks 
  WHERE
    -- [EDIT] >= instead of > to include rows that starts from split point (`fuz`)
    break >= start_time   
    -- [EDIT] <= instead of < to include rows that ends at split  (`foo`)
    AND break <= end_time  
  GROUP BY id, start_time, end_time
)

SELECT
  id,
  point AS start_time,
  IFNULL(next_point, point) AS end_time

FROM (
  SELECT 
    id,
    point, 
    LEAD(point) OVER(PARTITION BY id ORDER BY point) AS next_point
  FROM
    temp,
    UNNEST(ARRAY_CONCAT([start_time], break, [end_time])) aS point
)

WHERE next_point IS NOT NULL

-- [EDIT] Remove data with same value as start_time/end time (generated from rows that starts from split point or ends at split point, foo/fuz)
AND point != next_point


-- [EDIT] temp table does not handle ids that does not cross to any split point, adding them from main table (`quz1` and `quz2`)
UNION ALL SELECT
  id,
  start_time,
  end_time
FROM  `project.dataset.table`
WHERE id NOT IN (SELECT id FROM temp)

And here is the result

+-----+------+------------+----------+---+
| Row |  id  | start_time | end_time |   |
+-----+------+------------+----------+---+
|   1 | bar  | 18:00:00   | 22:00:00 |   |
|   2 | bar  | 22:00:00   | 23:00:00 |   |
|   3 | foo  | 21:45:00   | 22:00:00 |   |
|   4 | fuz  | 22:00:00   | 23:30:00 |   |
|   5 | qux  | 05:45:00   | 06:00:00 |   |
|   6 | qux  | 06:00:00   | 22:00:00 |   |
|   7 | qux  | 22:00:00   | 23:30:00 |   |
|   8 | quz1 | 23:30:00   | 23:45:00 |   |
|   9 | quz2 | 02:00:00   | 05:59:00 |   |
+-----+------+------------+----------+---+

EDIT

If the same input data are created as table (query with destination table)

  -- across split point
  SELECT "bar" AS id, TIME(18,0,0) AS start_time, TIME(23,0,0) AS end_time UNION ALL
  -- end at split point
  SELECT "foo", TIME(21,45,0), TIME(22,0,0) UNION ALL
  -- start from split point
  SELECT "fuz", TIME(22,0,0), TIME(23,30,0) UNION ALL
  -- across multiple split points
  SELECT "qux", TIME(05,45,0), TIME(23,30,0) UNION ALL
  -- no split point
  SELECT "quz1", TIME(23,30,0), TIME(23,45,0) UNION ALL
  -- no split point
  SELECT "quz2", TIME(02,0,0), TIME(05,59,0)  

and then I use the very same table as input for the main query

`project.dataset.table` AS (
  SELECT * FROM `myproject.mydataset.test`
),

executing the main query I'm getting the following error

Error: LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL and provides quite generic solution where you can have any number of break points as you need by "enlisting" them in breaks expressions

#standardSQL
WITH `project.dataset.table` AS (
  SELECT "foo" AS id, TIME(18,0,0) AS start_time, TIME(22,0,0) AS end_time UNION ALL
  SELECT "bar", TIME(19,0,0), TIME(23,0,0) UNION ALL
  SELECT "baz", TIME(08,0,0), TIME(11,0,0) UNION ALL
  SELECT "qux", TIME(05,30,0), TIME(07,30,0) UNION ALL
  SELECT "xxx", TIME(05,45,0), TIME(23,30,0)
), breaks AS (
  SELECT break FROM UNNEST([TIME(6,0,0), TIME(10,0,0), TIME(22,0,0)]) break
), temp AS (
  SELECT id, start_time, end_time, 
    ARRAY_AGG(break ORDER BY break) break
  FROM `project.dataset.table` CROSS JOIN breaks 
  WHERE break > start_time AND break < end_time 
  GROUP BY id, start_time, end_time
)
SELECT id, point start_time, 
  IFNULL(next_point, point) end_time
FROM (
  SELECT id, point, 
    LEAD(point) OVER(PARTITION BY id ORDER BY point) next_point
  FROM temp, UNNEST(ARRAY_CONCAT([start_time], break, [end_time])) point

)
WHERE NOT next_point IS NULL
-- ORDER BY id, point  

in above example I set three break points - 06:00, 10:00 and 22:00

and initial intervals as

Row id      start_time  end_time     
1   bar     19:00:00    23:00:00     
2   baz     08:00:00    11:00:00     
3   foo     18:00:00    22:00:00     
4   qux     05:30:00    07:30:00     
5   xxx     05:45:00    23:30:00    

and result is:

Row id      start_time  end_time     
1   bar     19:00:00    22:00:00     
2   bar     22:00:00    23:00:00     
3   baz     08:00:00    10:00:00     
4   baz     10:00:00    11:00:00     
5   qux     05:30:00    06:00:00     
6   qux     06:00:00    07:30:00     
7   xxx     05:45:00    06:00:00     
8   xxx     06:00:00    10:00:00     
9   xxx     10:00:00    22:00:00     
10  xxx     22:00:00    23:30:00     

Upvotes: 3

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

I would assume that you can have interval that consists both 06:00 and 22:00 hours so needs to be split on three intervals

Below example is for BigQuery Standard SQL and allows you to easily (but definitely not elegantly) extend approach to some reasonable (2-5) break points by cascading same logic for each break point like it is done below

#standardSQL
CREATE TEMP FUNCTION split_interval(start_time TIME, end_time TIME, break TIME) 
RETURNS ARRAY<STRUCT<start_time TIME, end_time TIME>> AS (
  IF(break > start_time AND break < end_time, 
    [STRUCT<start_time TIME, end_time TIME>(start_time, break), (break, end_time)], 
    [STRUCT<start_time TIME, end_time TIME>(start_time, end_time)]
  )
);
WITH `project.dataset.table` AS (
  SELECT "foo" AS id, TIME(18,0,0) AS start_time, TIME(22,0,0) AS end_time UNION ALL
  SELECT "bar", TIME(19,0,0), TIME(23,0,0) UNION ALL
  SELECT "baz", TIME(08,0,0), TIME(11,0,0) UNION ALL
  SELECT "qux", TIME(05,30,0), TIME(07,30,0) UNION ALL
  SELECT "xxx", TIME(05,45,0), TIME(23,30,0)
)
SELECT id, new_interval.* FROM (
  SELECT id, new_interval.* FROM `project.dataset.table`,
  UNNEST(split_interval(start_time, end_time, TIME(22,00,0))) new_interval
),
UNNEST(split_interval(start_time, end_time, TIME(06,00,0))) new_interval
-- ORDER BY id, start_time  

in above example - original intervals are as below

Row id      start_time  end_time     
1   bar     19:00:00    23:00:00     
2   baz     08:00:00    11:00:00     
3   foo     18:00:00    22:00:00     
4   qux     05:30:00    07:30:00     
5   xxx     05:45:00    23:30:00     

and result is

Row id      start_time  end_time     
1   bar     19:00:00    22:00:00     
2   bar     22:00:00    23:00:00     
3   baz     08:00:00    11:00:00     
4   foo     18:00:00    22:00:00     
5   qux     05:30:00    06:00:00     
6   qux     06:00:00    07:30:00     
7   xxx     05:45:00    06:00:00     
8   xxx     06:00:00    22:00:00     
9   xxx     22:00:00    23:30:00     

as you can see here - if you have 3rd break point - you just need to add yet another outer select around existing - like below

SELECT id, new_interval.* FROM (
  existing selects ...
),
UNNEST(split_interval(start_time, end_time, TIME(08,00,0))) new_interval  

it is that part which makes above not elegant enough :o)

Upvotes: 3

Mosha Pasumansky
Mosha Pasumansky

Reputation: 14004

Here is one way to do it:

WITH TABLE AS (
  SELECT "foo" AS id, TIME(18,0,0) AS start_time, TIME(22,0,0) AS end_time
  UNION ALL
  SELECT "bar", TIME(19,0,0), TIME(23,0,0)
  UNION ALL
  SELECT "baz", TIME(08,0,0), TIME(11,0,0)
  UNION ALL
  SELECT "qux", TIME(05,30,0), TIME(07,30,0)
)
SELECT id, intervals.* 
FROM TABLE, 
  UNNEST(IF(
    start_time < TIME '22:00:00' AND end_time > TIME '22:00:00',
    [STRUCT<start_time TIME, end_time TIME>(start_time, TIME '22:00:00'),(TIME '22:00:00',end_time)],
    [STRUCT<start_time TIME, end_time TIME>(start_time, end_time)])) intervals 

Upvotes: 1

Related Questions