Split a row that has data across several days by individual date (BigQuery)

Question

I have a dataset that contains information regarding a specific type of event, includes starttime, endtime, and duration (in minutes). I would like to group this data by date (will be extracted from starttime), but I have a row that represents a event go across several days, for example, index 33, starttime 2020-07-13 10:19:54 UTC, endtime 2020-07-15 13:13:21 UTC, and duration 3053. This row should be broke to three rows, for instance:

starttime          endtime            duration
2020-07-13 10:19   2020-07-13 23:59    821
2020-07-14 00:00   2020-07-14 23:59   1440
2020-07-15 00:00   2020-07-15 13:13    793

To be clear, I am trying to use these 3 rows to replace the original row, so when I group by date, the number will be correct.

Screenshot of the dataset

How do I do this in BigQuery?

Gordon Linoff · Accepted Answer

Use genereate_date_array() and unnest():

select t.ind,
       greatest(start_ts, timestamp(dt)) as start_ts,
       least(timestamp(date_add(dt, interval 1 day)), end_ts)
from (select 32 as ind, timestamp('2020-07-13 10:19:54 UTC') as start_ts,  timestamp('2020-07-13 13:13:21 UTC') as end_ts union all
      select 33 as ind, timestamp('2020-07-13 10:19:54 UTC') as start_ts,  timestamp('2020-07-15 13:13:21 UTC') as end_ts
     ) t cross join
     unnest(generate_date_array(date(start_ts), date(end_ts))) dt ;

Note: This gives the ending timestamp as midnight rather than one minute or one second before. That way, the intermediate days have 24 hours. Of course, you can subtracts a second or minute, but that seems less accurate to me.

Split a row that has data across several days by individual date (BigQuery)

Answers (1)

Related Questions