Reputation: 755
I have three tables, UpEvent, DownEvent and AnalysisWindow
UpEvent:
up_event_id | event_date | EventMetric
1 2015-01-01T06:00:00 54
2 2015-01-01T07:30:00 76
DownEvent:
down_event_id | event_date | EventMetric
1 2015-01-01T06:46:00 22
2 2015-01-01T07:33:00 34
AnalysisWindow:
window_id | win_start | win_end
1 2015-01-01T00:00:00 2015-01-01T04:00:00
2 2015-01-01T00:00:00 2015-01-01T08:00:00
.
.
I want to do analysis at each AnalysisWindow in order to aggregate the UpEvent's and DownEvent's that occurred between the defined window.
So for each AnalysisWindow record I would end up with 1 feature row:
WinStart | WinEnd | TotalUpEvents | TotalDownEvents
2015-01-01T00:00:00 2015-01-01T04:00:00 0 0
2015-01-01T00:00:00 2015-01-01T08:00:00 2 2
My first thought was to do something like
select win.win_start,
win.win_end,
count(ue.*),
sum(ue.EventMetric)
from AnalysisWindow win
left join UpEvent ue on (ue.event_date between win.win_start and win.win_end)
Which obviously doesn't work.
Am I approaching this problem incorrectly? I want to do a windowed analysis of the tables at various windows that I configure and get 1 aggregate record per window
Upvotes: 0
Views: 43
Reputation: 1269443
One method uses correlated subqueries:
select aw.*,
(select count(*)
from UpEvent ue
where ue.event_date between aw.win_start and aw.win_end)
) as ups,
(select count(*)
from DownEvent de
where de.event_date between aw.win_start and aw.win_end)
) as downs
from AnalysisWindow aw;
The above works, at least when formulated as:
with UpEvent as (
select 1 as up_event_id, '2015-01-01T06:00:00' as event_date, 54 as EventMetric union all
select 2, '2015-01-01T07:30:00', 76
),
DownEvent as (
select 1 as down_event_id, '2015-01-01T06:46:00' as event_date, 22 as EventMetric union all
select 2, '2015-01-01T07:33:00', 34
),
AnalysisWindow as (
select 1 as window_id , '2015-01-01T00:00:00' as win_start, '2015-01-01T04:00:00' as win_end union all
select 2, '2015-01-01T00:00:00', '2015-01-01T08:00:00'
)
select aw.*,
(select count(*)
from UpEvent ue
where ue.event_date between aw.win_start and aw.win_end
) as ups,
(select count(*)
from DownEvent de
where de.event_date between aw.win_start and aw.win_end
) as downs
from AnalysisWindow aw;
The alternative is to use union all
:
ud as (
select event_date, 1 as ups, 0 as downs from upevent
union all
select event_date, 0 as ups, 1 as downs from downevent
)
select aw.window_id, aw.win_start, aw.win_end, sum(ups), sum(downs)
from AnalysisWindow aw join
ud
ON ud.event_date between aw.win_start and aw.win_end
group by aw.window_id, aw.win_start, aw.win_end
union all
select aw.window_id, aw.win_start, aw.win_end, 0, 0
from AnalysisWindow aw
where not exists (select 1 from ud where ud.event_date between aw.win_start and aw.win_end)
Upvotes: 0
Reputation: 172944
Below is for BigQuery Standard SQL (and actually works!)
#standardSQL
WITH ue_win AS (
SELECT
window_id, COUNT(1) TotalUpEvents
FROM `project.dataset.AnalysisWindow` win
CROSS JOIN `project.dataset.UpEvent` ue
WHERE ue.event_date BETWEEN win.win_start AND win.win_end
GROUP BY window_id
), de_win AS (
SELECT
window_id, COUNT(1) TotalDownEvents
FROM `project.dataset.AnalysisWindow` win
CROSS JOIN `project.dataset.DownEvent` de
WHERE de.event_date BETWEEN win.win_start AND win.win_end
GROUP BY window_id
)
SELECT
window_id, win_start, win_end,
IFNULL(TotalUpEvents, 0) TotalUpEvents,
IFNULL(TotalDownEvents, 0) TotalDownEvents
FROM `project.dataset.AnalysisWindow` win
LEFT JOIN ue_win USING(window_id)
LEFT JOIN de_win USING(window_id)
Upvotes: 2