Justin
Justin

Reputation: 3634

Combining ranges in SQL/BigQuery

I want to combine adjoining date ranges in BigQuery.

I have a table like this:

ID   START                     END
1    2019-01-18 17:34:58 UTC   2019-02-18 12:14:59 UTC
1    2019-02-18 06:04:39 UTC   2019-02-18 08:05:05 UTC
1    2019-02-18 08:05:05 UTC   2019-02-18 10:06:05 UTC
1    2019-02-18 10:06:05 UTC   2019-02-19 11:16:15 UTC
2    2019-01-19 06:02:29 UTC   2019-01-29 11:02:23 UTC

Since the three middle rows represent a single range split into three pieces, I want to combine them so the table looks like this:

ID   START                     END
1    2019-01-18 17:34:58 UTC   2019-02-18 12:14:59 UTC
1    2019-02-18 06:04:39 UTC   2019-02-19 11:16:15 UTC
2    2019-01-19 06:02:29 UTC   2019-01-29 11:02:23 UTC

What is the best way to accomplish this?

Upvotes: 0

Views: 105

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270431

You need to determine where the ranges start. In your case, they seem to have exact matching end and starts, so you can use lag() to identify where groups start. A cumulative count of the starts provides a grouping id, which can be used for aggregation:

select id, min(start) as start, max(end) as end
from (select t.*, countif(prev_end is null or prev_end <> start) over (partition by id order by start) as grp
      from (select t.*, lag(end) over (partition by id order by start) as prev_end
            from t
           ) t
     ) t
group by id, grp;

If the groups can overlaps, then a cumulative maximum usually does the trick:

select id, min(start) as start, max(end) as end
from (select t.*, countif(prev_end is null or prev_end <> start) over (partition by id order by start) as grp
      from (select t.*,
                   max(end) over (partition by id order by start rows between unbounded preceding and 1 preceding) as prev_end
            from t
           ) t
     ) t
group by id, grp;

Upvotes: 1

Related Questions