Reputation: 145
I have data like this
Start End
2022-01-01 2022-01-31
2022-02-01 2022-02-28
2022-03-05 2022-04-05
2022-05-01 2022-05-31
2022-06-01 2022-06-30
2022-07-01 2022-07-15
2022-07-05 2022-07-16
2022-07-15 2022-08-01
I want to find all continuous periods (where there is no more than 1 day between the start and the end) An example of the desired result
2022-01-01 2022-02-28
2022-03-05 2022-04-05
2022-05-01 2022-08-01
Upvotes: 0
Views: 128
Reputation: 172994
i`m add some new data to my example. Periods can cross each other
in this case use dense_rank
instead of row_number
select min(day) as start, max(day) as `end`
from (
select day, unix_date(day) - dense_rank() over(order by day) as grp
from your_table, unnest(generate_date_array(date(start), date(`end`))) day
)
group by grp
with output
Upvotes: 0
Reputation: 3528
For overlapping date ranges, the best way is to create each day in these ranges and combinde them. Searching for date gaps is the next step. However, this generates a huge amount of rows, depending on the date range. See also the solution of @mikhail-berlyant
with tbl as
(
Select date("2022-01-01") as start,date("2022-01-31") as ending
union all Select date("2022-02-01"),date("2022-02-28")
union all Select date("2022-03-02"),date("2022-03-02")
union all Select date("2022-03-05"),date("2022-04-05")
union all Select date("2022-05-01"),date("2022-05-31")
union all Select date("2022-06-01"),date("2022-06-30")
union all Select date("2022-07-01"),date("2022-07-12")
union all Select date("2022-06-05"),date("2022-07-16")
union all Select date("2022-07-15"),date("2022-08-01")
),
tmp as (
Select toDAY,
date_diff(today,lag(today) over (order by today),day)-1 as diff
from (Select DISTINCT toDAY from tbl, unnest(generate_date_array(start,ending)) as toDAY)
),
tmp2 as (
Select toDay,
sum(ifnull(diff,0)) over (order by today range between unbounded preceding and current row) as grp
from tmp
)
select
grp, min(today),max(today) from tmp2
group by 1
For non overlapping ranges, I suggested this solution:
There is the need to combine the value between rows. LEAD
and LAG
are these window
functions. With these the comparision of the start date with the previous end date is possible. The same is done for the end date with the next start date. Thus we obain two columns A
and B_
which have only an entrie for breaks.
We need to filter all rows where A
and B_
have both nulls, qualify
helps here to prevent a further select
statement. The A
and B_
dates are now split often over two rows, therefore the end date in column B
is put in the above row, if this one is null: ifnull(B_,lead(B_) over (order by start))
with tbl as
(
Select date("2022-01-01") as start,date("2022-01-31") as ending
union all Select date("2022-02-01"),date("2022-02-28")
union all Select date("2022-03-05"),date("2022-04-05")
union all Select date("2022-05-01"),date("2022-05-31")
union all Select date("2022-06-01"),date("2022-06-30")
union all Select date("2022-07-01"),date("2022-07-15")
)
Select A, ifnull(B_,lead(B_) over (order by start)) B
from
(
Select *,
if(ifnull(date_diff(start,lag(ending) over (order by Start),day),99)>1,start,null) A,
if(ifnull(date_diff(ending,lead(start) over (order by Start),day),-99)<-1,ending,null) B_
from tbl
Qualify A is not null or B_ is not null
#order by Start
)
Qualify A is not null
order by A
Upvotes: 2
Reputation: 12234
Consider below approach.
WITH partitions AS (
SELECT *, COUNTIF(flag) OVER (ORDER BY start) par FROM (
SELECT *, DATE_DIFF(start, LAG(`end`) OVER (ORDER BY start), DAY) > 1 flag
FROM sample_table
)
)
SELECT MIN(start) start, MAX(`end`) `end`
FROM partitions
GROUP BY par;
+------------+------------+
| start | end |
+------------+------------+
| 2022-01-01 | 2022-02-28 |
| 2022-03-05 | 2022-04-05 |
| 2022-05-01 | 2022-08-01 |
+------------+------------+
Upvotes: 1
Reputation: 172994
Consider yet another approach
select min(day) as start, max(day) as `end`
from (
select day, unix_date(day) - row_number() over(order by day) as grp
from your_table, unnest(generate_date_array(date(start), date(`end`))) day
)
group by grp
if applied to sample data in your question - output is
Upvotes: 2