kee
kee

Reputation: 11619

BigQuery: how to attribute minutes to individual days

I have a table with following schema in BigQuery:

deviceId int,
state int,
started timestamp,
duration minutes

A row here means that this device has this state starting from started for the next minutes. The minutes here can be over a few days.

Now what I want to know per day how many minutes this deviceId had this state. So a row in the above table can be expressed as a list of the following:

deviceId int,
state int,
date date,
duration minutes

Here is some sample table I was playing with:

WITH `temp.test` AS (
  SELECT 1 id, 1 state, TIMESTAMP('2018-08-17 10:40:00') ts, 120 minutes UNION ALL
  SELECT 1, 2, '2018-08-17 12:40:00', 120 UNION ALL
  SELECT 1, 1, '2018-08-17 14:40:00', 560 UNION ALL
  SELECT 2, 1, '2018-08-17 09:00:00', 180 UNION ALL
  SELECT 2, 2, '2018-08-17 12:00:00', 2940 
)

In the case of the last row, its duration 2940 minutes which is 49 hours (2 days and 1 hour) so this needs to be transformed into multiple rows. The output I want from this example is the following:

1, 1, 2018-08-17, 120
1, 2, 2018-08-17, 120
1, 1, 2018-08-17, 560
2, 1, 2018-08-17, 180
2, 2, 2018-08-17, 720 
2, 2, 2018-08-17, 1440
2, 2, 2018-08-17, 780 

Is there a way to achieve this in SQL or should I write a UDF?

Upvotes: 0

Views: 115

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

Below example is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 id, 1 state, TIMESTAMP('2018-08-17 10:40:00') ts, 120 minutes UNION ALL
  SELECT 1, 2, '2018-08-17 12:40:00', 120 UNION ALL
  SELECT 1, 1, '2018-08-17 14:40:00', 560 UNION ALL
  SELECT 2, 1, '2018-08-17 09:00:00', 180 UNION ALL
  SELECT 2, 2, '2018-08-17 12:00:00', 2940 
)
SELECT 
  id, state, day,
  CASE 
    WHEN day = first_day AND day = last_day THEN minutes
    WHEN day = first_day THEN 24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP(first_day), MINUTE)
    WHEN day = last_day THEN TIMESTAMP_DIFF(TIMESTAMP_ADD(ts, INTERVAL minutes MINUTE), TIMESTAMP(last_day), MINUTE)
    ELSE 24*60
  END duration
FROM `project.dataset.table`, 
UNNEST(GENERATE_DATE_ARRAY(DATE(ts), DATE(TIMESTAMP_ADD(ts, INTERVAL minutes - 1 MINUTE)), INTERVAL 1 DAY)) day,
UNNEST([STRUCT<first_day DATE, last_day DATE>(DATE(ts), DATE(TIMESTAMP_ADD(ts, INTERVAL minutes MINUTE)))]) key

with result:

Row id  state   day         duration     
1   1   1       2018-08-17  120  
2   1   2       2018-08-17  120  
3   1   1       2018-08-17  560  
4   2   1       2018-08-17  180  
5   2   2       2018-08-17  720  
6   2   2       2018-08-18  1440     
7   2   2       2018-08-19  780     

The only what you need to tune (I think) is using minutes - 1 vs minutes in below

TIMESTAMP_ADD(ts, INTERVAL minutes - 1 MINUTE)  

this is to handle case when end of period come to start of the day - making duration of 0(zero) for that day

Upvotes: 3

Related Questions