Reputation: 643
I have data as below and want to merge the records for overlapping dates. MIN and MAX of start and end dates for overlapping records should be the Start and end date of merged record.
Before merge:
Item Code Start_date End_date
============== =========== ===========
111 15-May-2004 20-Jun-2004
111 22-May-2004 07-Jun-2004
111 20-Jun-2004 13-Aug-2004
111 27-May-2004 30-Aug-2004
111 02-Sep-2004 23-Dec-2004
222 21-May-2004 19-Aug-2004
Required output:
Item Code Start_date End_date
============== =========== ===========
111 15-May-2004 30-Aug-2004
111 02-Sep-2004 23-Dec-2004
222 21-May-2004 19-Aug-2004
you can create sample data using
create table item(item_code number, start_date date, end_date date);
insert into item values (111,to_date('15-May-2004','DD-Mon-YYYY'),to_date('20-Jun-2004','DD-Mon-YYYY'));
insert into item values (111,to_date('22-May-2004','DD-Mon-YYYY'),to_date('07-Jun-2004','DD-Mon-YYYY'));
insert into item values (111,to_date('20-Jun-2004','DD-Mon-YYYY'),to_date('13-Aug-2004','DD-Mon-YYYY'));
insert into item values (111,to_date('27-May-2004','DD-Mon-YYYY'),to_date('30-Aug-2004','DD-Mon-YYYY'));
insert into item values (111,to_date('02-Sep-2004','DD-Mon-YYYY'),to_date('23-Dec-2004','DD-Mon-YYYY'));
insert into item values (222,to_date('21-May-2004','DD-Mon-YYYY'),to_date('19-Aug-2004','DD-Mon-YYYY'));
commit;
Upvotes: 2
Views: 184
Reputation: 643
with help of above answers i am able to simplify this as below
WITH max_dates AS
(
SELECT
item_code
,start_date
,end_date
,Max(end_date)
Over (PARTITION BY item_code
ORDER BY start_date
) AS max_date
FROM item
) ,
max_dates1 as
(
select max_dates.* , lag(max_date) over(partition by item_code order by 1) as MPD from max_dates
)
select ITEM_CODE,start_date,end_date from max_dates1
WHERE MPD < start_date
OR MPD IS NULL
Upvotes: 0
Reputation: 60462
It's a variation of a gaps&islands problem. First calculate the maximum previous end date for each row. Then filter the rows where the current row's start date is greater than that max date, this is the start of a new group and the group's end date is found in the next row.
WITH max_dates AS
(
SELECT
item_code
,start_date
,Max(end_date) -- get the maximum prevous end_date
Over (PARTITION BY item_code
ORDER BY start_date
ROWS BETWEEN Unbounded Preceding AND 1 Preceding) AS max_prev_date
,Max(end_date) -- get the maximum overall date (only needed for the last group)
Over (PARTITION BY item_code) AS max_date
FROM item
)
SELECT
item_code
,start_date
,Coalesce(Lead(max_prev_date) -- next row got the end date for the current row
Over (PARTITION BY item_code
ORDER BY start_date)
,max_date ) AS end_date -- no next row for the last row --> overall maximum end_date
FROM max_dates
WHERE max_prev_date < start_date -- maximum previous end date is less than current start date --> start of a new group
OR max_prev_date IS NULL -- first row
Upvotes: 2
Reputation: 110
In SQL Server you can try this. It will give your desired output but as performance point of view the query might slow down, When there is a large number of data to be checked.
DECLARE @item Table(item_code int, start_date date, end_date date);
insert into @item values (111,'15-May-2004','20-Jun-2004');
insert into @item values (111,'22-May-2004','07-Jun-2004');
insert into @item values (111,'20-Jun-2004','13-Aug-2004');
insert into @item values (111,'27-May-2004','30-Aug-2004');
insert into @item values (111,'02-Sep-2004','23-Dec-2004');
insert into @item values (222,'21-May-2004','19-Aug-2004');
SELECT * FROM @item WHERE item_code IN (SELECT item_code FROM @item GROUP BY item_code) AND
(start_date IN (SELECT max(start_date) FROM @item GROUP BY item_code) or start_date In (SELECT min(start_date) FROM @item GROUP BY item_code))
Upvotes: 0
Reputation: 1269493
The code for this type of problem is rather tricky. Here is one approach that works pretty well:
with item (item_code, start_date, end_date) as (
select 111,to_date('15-05-2004','DD-MM-YYYY'),to_date('20-06-2004','DD-MM-YYYY') from dual union all
select 111,to_date('22-05-2004','DD-MM-YYYY'),to_date('07-06-2004','DD-MM-YYYY') from dual union all
select 111,to_date('20-06-2004','DD-MM-YYYY'),to_date('13-08-2004','DD-MM-YYYY') from dual union all
select 111,to_date('27-05-2004','DD-MM-YYYY'),to_date('30-08-2004','DD-MM-YYYY') from dual union all
select 111,to_date('02-09-2004','DD-MM-YYYY'),to_date('23-12-2004','DD-MM-YYYY') from dual union all
select 222,to_date('21-05-2004','DD-MM-YYYY'),to_date('19-08-2004','DD-MM-YYYY') from dual
),
id as (
select item_code, start_date as dte, count(*) as inc
from item
group by item_code, start_date
union all
select item_code, end_date, - count(*) as inc
from item
group by item_code, end_date
),
id2 as (
select id.*, sum(inc) over (partition by item_code order by dte) as running_inc
from id
),
id3 as (
select id2.*, sum(case when running_inc = 0 then 1 else 0 end) over (partition by item_code order by dte desc) as grp
from id2
)
select item_code, min(dte) as start_date, max(dte) as end_date
from id3
group by item_code, grp;
And a rextester to validate it.
What is this doing? Good question. The idea in these problems is to define the adjacent groups. This method does so by counting the number of "starts" and "ends" up to a given date. When the value is 0, a group ends.
The specific steps are as follows:
(1) Break out all the dates onto separate rows along with an indicator of whether the date is a start date or end date. This indicator is key to defining the ranges -- +1 to "enter" and "-1" to exit.
(2) Calculate the running total of the indicators. The 0s in this total are the ends of overlapping ranges.
(3) Do a reverse cumulative sum of the 0s to identify the groups.
(4) Aggregate to get the final results.
You can look at each of the CTEs to see what is happening in the data.
Upvotes: 3