Reputation: 31
I want to join a table containing a series of sub-events to the first one's id, so i can aggregate on the complete event later. Couldn't find a command that does that in bigquery sql.
Alternatively, it would help if I could create a joint table of each start sub-event to its respective ending, so I can then check for each row when it's between the start and the ending of a session, to determine which session it's related to.
If anyone knows how to do one of the two options, it would be a great help. Thanks!
The table looks like that:
user id time part
111 1 01/03/2020 09:00 Start
111 2 01/03/2020 09:01 Middle
111 3 01/03/2020 09:03 end
111 4 01/03/2020 19:00 start
111 5 01/03/2020 19:10 Middle
111 6 01/03/2020 19:11 Middle
111 7 01/03/2020 19:20 End
222 8 02/03/2020 07:32 Start
222 9 02/03/2020 07:35 End
222 10 03/03/2020 20:40 Start
222 11 03/03/2020 20:45 end
and I want it to look like that:
user id time part
111 1 01/03/2020 09:00 Start
111 1 01/03/2020 09:01 Middle
111 1 01/03/2020 09:03 end
111 4 01/03/2020 19:00 start
111 4 01/03/2020 19:10 Middle
111 4 01/03/2020 19:11 Middle
111 4 01/03/2020 19:20 End
222 8 02/03/2020 07:32 Start
222 8 02/03/2020 07:35 End
222 10 03/03/2020 20:40 Start
222 10 03/03/2020 20:45 end
Upvotes: 1
Views: 650
Reputation: 172974
Below is for BigQuery Standard SQL
#standardSQL
SELECT user,
MIN(id) OVER(PARTITION BY user, grp) AS id,
time, part
FROM (
SELECT *,
COUNTIF(LOWER(part) = 'start') OVER(PARTITION BY user ORDER BY time) grp
FROM `project.dataset.table`
)
if to apply to sample data from your question - result is
Row user id time part
1 111 1 01/03/2020 09:00 Start
2 111 1 01/03/2020 09:01 Middle
3 111 1 01/03/2020 09:03 end
4 111 4 01/03/2020 19:00 start
5 111 4 01/03/2020 19:10 Middle
6 111 4 01/03/2020 19:11 Middle
7 111 4 01/03/2020 19:20 End
8 222 8 02/03/2020 07:32 Start
9 222 8 02/03/2020 07:35 End
10 222 10 03/03/2020 20:40 Start
11 222 10 03/03/2020 20:45 end
Upvotes: 2