Dave Anderson
Dave Anderson

Reputation: 755

Aggregating over an Event table based on time-window periods in configured in another table

I have three tables, UpEvent, DownEvent and AnalysisWindow

UpEvent:
up_event_id | event_date            |  EventMetric
1              2015-01-01T06:00:00       54
2              2015-01-01T07:30:00       76

DownEvent:
down_event_id | event_date          |  EventMetric
1              2015-01-01T06:46:00         22
2              2015-01-01T07:33:00         34

AnalysisWindow:
window_id |        win_start           |    win_end
1              2015-01-01T00:00:00       2015-01-01T04:00:00
2              2015-01-01T00:00:00       2015-01-01T08:00:00
.
.

I want to do analysis at each AnalysisWindow in order to aggregate the UpEvent's and DownEvent's that occurred between the defined window.

So for each AnalysisWindow record I would end up with 1 feature row:

WinStart             |  WinEnd               |   TotalUpEvents  |  TotalDownEvents
2015-01-01T00:00:00    2015-01-01T04:00:00         0                  0
2015-01-01T00:00:00    2015-01-01T08:00:00         2                  2

My first thought was to do something like

select win.win_start, 
       win.win_end, 
       count(ue.*), 
       sum(ue.EventMetric) 
from AnalysisWindow win
left join UpEvent ue on (ue.event_date between win.win_start and win.win_end)

Which obviously doesn't work.

Am I approaching this problem incorrectly? I want to do a windowed analysis of the tables at various windows that I configure and get 1 aggregate record per window

Upvotes: 0

Views: 43

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269443

One method uses correlated subqueries:

select aw.*,
       (select count(*)
        from UpEvent ue
        where ue.event_date between aw.win_start and aw.win_end)
       ) as ups,
       (select count(*)
        from DownEvent de
        where de.event_date between aw.win_start and aw.win_end)
       ) as downs
from AnalysisWindow aw;

The above works, at least when formulated as:

with UpEvent as (
      select 1 as up_event_id, '2015-01-01T06:00:00' as event_date, 54 as EventMetric union all
      select 2, '2015-01-01T07:30:00', 76
     ),
     DownEvent as (
      select 1 as down_event_id, '2015-01-01T06:46:00' as event_date, 22 as EventMetric union all
      select 2, '2015-01-01T07:33:00', 34
     ),
     AnalysisWindow as (
      select 1 as window_id , '2015-01-01T00:00:00' as win_start, '2015-01-01T04:00:00' as win_end union all
      select 2, '2015-01-01T00:00:00', '2015-01-01T08:00:00'
     )
select aw.*,
       (select count(*)
        from UpEvent ue
        where ue.event_date between aw.win_start and aw.win_end
       ) as ups,
       (select count(*)
        from DownEvent de
        where de.event_date between aw.win_start and aw.win_end
       ) as downs
from AnalysisWindow aw;

The alternative is to use union all:

 ud as (
  select event_date, 1 as ups, 0 as downs from upevent
  union all
  select event_date, 0 as ups, 1 as downs from downevent
 )
select aw.window_id, aw.win_start, aw.win_end, sum(ups), sum(downs)
from AnalysisWindow aw join
     ud
     ON ud.event_date between aw.win_start and aw.win_end
group by aw.window_id, aw.win_start, aw.win_end
union all
select aw.window_id, aw.win_start, aw.win_end, 0, 0
from AnalysisWindow aw
where not exists (select 1 from ud where ud.event_date between aw.win_start and aw.win_end)

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below is for BigQuery Standard SQL (and actually works!)

#standardSQL
WITH ue_win AS ( 
  SELECT 
    window_id, COUNT(1) TotalUpEvents  
  FROM `project.dataset.AnalysisWindow` win
  CROSS JOIN `project.dataset.UpEvent` ue
  WHERE ue.event_date BETWEEN win.win_start AND win.win_end
  GROUP BY window_id
), de_win AS ( 
  SELECT 
    window_id, COUNT(1) TotalDownEvents  
  FROM `project.dataset.AnalysisWindow` win
  CROSS JOIN `project.dataset.DownEvent` de
  WHERE de.event_date BETWEEN win.win_start AND win.win_end
  GROUP BY window_id
)
SELECT 
  window_id, win_start, win_end, 
  IFNULL(TotalUpEvents, 0) TotalUpEvents, 
  IFNULL(TotalDownEvents, 0) TotalDownEvents
FROM `project.dataset.AnalysisWindow` win
LEFT JOIN ue_win USING(window_id)
LEFT JOIN de_win USING(window_id)

Upvotes: 2

Related Questions