Reputation: 755
Is it possible to group records based on a predefined date range differences (e.g. 30 days) based on the start_date
of a row and the end_date
of the previous row for non-consecutive dates? I want to take the min(start_date)
and max(end_date)
of each group. I tried the lead
and lag
function with partition by in Oracle but couldn't come up with a proper solution. A related but unanswered post related to my question can be found here.
E.g.
ROW_NUM PROJECT_ID START_DATE END_DATE
1 1 2016-01-14 2016-08-15
2 1 2016-08-16 2016-09-10 --- Date diff Row 1&2 = 1 Day
3 1 2016-11-15 2017-01-10 --- Date diff Row 2&3 = 66 Days
4 1 2016-01-17 2017-04-10 --- Date diff Row 3&4 = 7 Days
5 2 2018-04-28 2018-06-01 --- Other Project
6 2 2019-02-01 2019-04-05 --- Diff > 30 Days
7 2 2019-04-08 2019-07-28 --- Diff 3 Days
Expected Result:
ROW_NUM PROJECT_ID START_DATE END_DATE
1 1 2016-01-14 2016-09-10
3 1 2016-11-15 2017-04-10
5 2 2018-04-28 2018-06-01
6 2 2019-02-01 2019-07-28
Upvotes: 1
Views: 41
Reputation: 1269523
Use lag()
and a cumulative sum to define where the groups begin. Then aggregate:
select project_id, min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date > start_date - interval '30' day then 0 else 1 end) over
(partition by project_id order by start_date) as grp
from (select t.*,
lag(end_date) over (partition by project_id order by start_date) as prev_end_date
from t
) t
) t
group by project_id, grp;
Upvotes: 2