Reputation: 3634
I want to combine adjoining date ranges in BigQuery.
I have a table like this:
ID START END
1 2019-01-18 17:34:58 UTC 2019-02-18 12:14:59 UTC
1 2019-02-18 06:04:39 UTC 2019-02-18 08:05:05 UTC
1 2019-02-18 08:05:05 UTC 2019-02-18 10:06:05 UTC
1 2019-02-18 10:06:05 UTC 2019-02-19 11:16:15 UTC
2 2019-01-19 06:02:29 UTC 2019-01-29 11:02:23 UTC
Since the three middle rows represent a single range split into three pieces, I want to combine them so the table looks like this:
ID START END
1 2019-01-18 17:34:58 UTC 2019-02-18 12:14:59 UTC
1 2019-02-18 06:04:39 UTC 2019-02-19 11:16:15 UTC
2 2019-01-19 06:02:29 UTC 2019-01-29 11:02:23 UTC
What is the best way to accomplish this?
Upvotes: 0
Views: 105
Reputation: 1270431
You need to determine where the ranges start. In your case, they seem to have exact matching end and starts, so you can use lag()
to identify where groups start. A cumulative count of the starts provides a grouping id, which can be used for aggregation:
select id, min(start) as start, max(end) as end
from (select t.*, countif(prev_end is null or prev_end <> start) over (partition by id order by start) as grp
from (select t.*, lag(end) over (partition by id order by start) as prev_end
from t
) t
) t
group by id, grp;
If the groups can overlaps, then a cumulative maximum usually does the trick:
select id, min(start) as start, max(end) as end
from (select t.*, countif(prev_end is null or prev_end <> start) over (partition by id order by start) as grp
from (select t.*,
max(end) over (partition by id order by start rows between unbounded preceding and 1 preceding) as prev_end
from t
) t
) t
group by id, grp;
Upvotes: 1