Nick
Nick

Reputation: 55

SQL (BigQuery) Grouping Runtime Per Day

I have the following data which I want to group into seconds per day in BigQuery.
Source Table:

+--------------+---------------------+---------------------+
| ComputerName | StartDatetime       | EndDatetime         |
+--------------+---------------------+---------------------+
| Computer1    | 2020-06-10T21:01:28 | 2020-06-10T21:20:19 |
+--------------+---------------------+---------------------+
| Computer1    | 2020-06-10T22:54:01 | 2020-06-11T05:21:48 |
+--------------+---------------------+---------------------+
| Computer2    | 2020-06-08T09:11:54 | 2020-06-10T11:36:27 |
+--------------+---------------------+---------------------+

I want to be able to visualise the data in the following way

+------------+--------------+------------------+
| Date       | ComputerName | Runtime(Seconds) |
+------------+--------------+------------------+
| 2020-10-10 | Computer1    | 5089             |
+------------+--------------+------------------+
| 2020-10-11 | Computer1    | 19308            |
+------------+--------------+------------------+
| 2020-10-08 | Computer2    | 53285            |
+------------+--------------+------------------+
| 2020-10-09 | Computer2    | 86400            |
+------------+--------------+------------------+
| 2020-10-10 | Computer2    | 41787            |
+------------+--------------+------------------+

I am not too sure of the way I should approach this. Some input would be greatly appreciated.

Upvotes: 2

Views: 72

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Another option for BigQuery Standard SQL

Straightforward, "little silly" and almost logic-less option of just "stupidly" counting seconds in respective days - still looks like an option to me

#standardSQL
select Date, ComputerName, 
    countif(second >= timestamp(StartDatetime) and second < timestamp(EndDatetime)) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date,
unnest(generate_timestamp_array(timestamp(Date + 1), timestamp(Date), interval -1 second)) second with offset
where offset > 0
group by Date, ComputerName

if applied to sample data from your question - output is

enter image description here

Upvotes: 2

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

#standardSQL
select Date, ComputerName, 
  sum(datetime_diff(
    least(datetime (Date + 1), EndDatetime),
    greatest(datetime(Date), StartDatetime),
    second
  )) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date
group by Date, ComputerName

if to apply to sample data in your question - as in below example

#standardSQL
with `project.dataset.table` as (
  select 'Computer1' ComputerName, datetime '2020-06-10T21:01:28' StartDatetime, datetime '2020-06-10T21:20:19' EndDatetime union all
  select 'Computer1', '2020-06-10T22:54:01', '2020-06-11T05:21:48' union all
  select 'Computer2', '2020-06-08T09:11:54', '2020-06-10T11:36:27' 
)
select Date, ComputerName, 
  sum(datetime_diff(
    least(datetime (Date + 1), EndDatetime),
    greatest(datetime(Date), StartDatetime),
    second
  )) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date
group by Date, ComputerName

output is

enter image description here

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269853

This is an interval overlap problem. You can solve this by splitting each time period into separate days and then looking at the overlap for each day:

with t as (
      select 'Computer1' as computername, datetime '2020-06-10T21:01:28' as startdatetime, datetime '2020-06-10T21:20:19' as enddatetime union all
      select 'Computer1' as computername, datetime '2020-06-10T22:54:01' as startdatetime, datetime '2020-06-11T05:21:48' as enddatetime union all
      select 'Computer2' as computername, datetime '2020-06-08T09:11:54' as startdatetime, datetime '2020-06-10T11:36:27' as enddatetime 
     )
select dte, t.computername,
       sum(case when enddatetime >= dte and
                     startdatetime < date_add(dte, interval 1 day)
                then datetime_diff(least(date_add(dte, interval 1 day), enddatetime), 
                                   greatest(dte, startdatetime),
                                   second)
           end) as runtime_seconds
from (select t.*, 
             generate_date_array(date(t.startdatetime), date(t.enddatetime), interval 1 day) gda
      from t 
     ) t cross join
     unnest(gda) dte
group by dte, t.computername;

Upvotes: 2

Related Questions