Reputation: 11619
I have a table with following schema in BigQuery:
deviceId int,
state int,
started timestamp,
duration minutes
A row here means that this device has this state starting from started for the next minutes. The minutes here can be over a few days.
Now what I want to know per day how many minutes this deviceId had this state. So a row in the above table can be expressed as a list of the following:
deviceId int,
state int,
date date,
duration minutes
Here is some sample table I was playing with:
WITH `temp.test` AS (
SELECT 1 id, 1 state, TIMESTAMP('2018-08-17 10:40:00') ts, 120 minutes UNION ALL
SELECT 1, 2, '2018-08-17 12:40:00', 120 UNION ALL
SELECT 1, 1, '2018-08-17 14:40:00', 560 UNION ALL
SELECT 2, 1, '2018-08-17 09:00:00', 180 UNION ALL
SELECT 2, 2, '2018-08-17 12:00:00', 2940
)
In the case of the last row, its duration 2940 minutes which is 49 hours (2 days and 1 hour) so this needs to be transformed into multiple rows. The output I want from this example is the following:
1, 1, 2018-08-17, 120
1, 2, 2018-08-17, 120
1, 1, 2018-08-17, 560
2, 1, 2018-08-17, 180
2, 2, 2018-08-17, 720
2, 2, 2018-08-17, 1440
2, 2, 2018-08-17, 780
Is there a way to achieve this in SQL or should I write a UDF?
Upvotes: 0
Views: 115
Reputation: 172974
Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 state, TIMESTAMP('2018-08-17 10:40:00') ts, 120 minutes UNION ALL
SELECT 1, 2, '2018-08-17 12:40:00', 120 UNION ALL
SELECT 1, 1, '2018-08-17 14:40:00', 560 UNION ALL
SELECT 2, 1, '2018-08-17 09:00:00', 180 UNION ALL
SELECT 2, 2, '2018-08-17 12:00:00', 2940
)
SELECT
id, state, day,
CASE
WHEN day = first_day AND day = last_day THEN minutes
WHEN day = first_day THEN 24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP(first_day), MINUTE)
WHEN day = last_day THEN TIMESTAMP_DIFF(TIMESTAMP_ADD(ts, INTERVAL minutes MINUTE), TIMESTAMP(last_day), MINUTE)
ELSE 24*60
END duration
FROM `project.dataset.table`,
UNNEST(GENERATE_DATE_ARRAY(DATE(ts), DATE(TIMESTAMP_ADD(ts, INTERVAL minutes - 1 MINUTE)), INTERVAL 1 DAY)) day,
UNNEST([STRUCT<first_day DATE, last_day DATE>(DATE(ts), DATE(TIMESTAMP_ADD(ts, INTERVAL minutes MINUTE)))]) key
with result:
Row id state day duration
1 1 1 2018-08-17 120
2 1 2 2018-08-17 120
3 1 1 2018-08-17 560
4 2 1 2018-08-17 180
5 2 2 2018-08-17 720
6 2 2 2018-08-18 1440
7 2 2 2018-08-19 780
The only what you need to tune (I think) is using minutes - 1
vs minutes
in below
TIMESTAMP_ADD(ts, INTERVAL minutes - 1 MINUTE)
this is to handle case when end of period come to start of the day - making duration of 0(zero) for that day
Upvotes: 3