Reputation: 109
I have a problem with SQL. I have the following table:
declare @t table (START_DATE datetime,
END_DATE datetime,
GROSS_SALES_PRICE decimal(10,2)
);
insert into @t
values ('2014-08-06 00:00:00.000', '2014-10-06 23:59:59.000', 29.99),
('2014-09-06 00:00:00.000', '2014-09-09 23:59:59.000', 32.99),
('2014-09-10 00:00:00.000', '2014-09-30 23:59:59.000', 32.99),
('2014-10-07 00:00:00.000', '2049-12-31 23:59:59.000', 34.99)
I would like to separate the dates which overlaps. For example I have in the first row START_DATE 2014-08-06 and END_DATE 2014-10-06. We can see that the dates from the second and the third row are inside this period of time from first row.
So I would like to separate them as follows:
declare @t2 table (START_DATE datetime,
END_DATE datetime,
GROSS_SALES_PRICE decimal(10,2)
);
insert into @t2
values ('2014-08-06 00:00:00.000', '2014-09-05 23:59:59.000', 29.99),
('2014-09-06 00:00:00.000', '2014-09-09 23:59:59.000', 32.99),
('2014-09-10 00:00:00.000', '2014-09-30 23:59:59.000', 32.99),
('2014-10-01 00:00:00.000', '2014-10-06 23:59:59.000', 29.99),
('2014-10-07 00:00:00.000', '2049-12-31 23:59:59.000', 34.99)
So the second and the third rows remained unchanged. The first row should have new END_DATE. We also have new row. The GROSS_SALES_PRICE should remain as it is in internal period. Thanks for help. I am using SQL Server 2014
Upvotes: 6
Views: 174
Reputation: 7753
How about using Lead to find the value from the next row:
SELECT START_DATE,
CASE
WHEN LEAD(Start_Date) OVER (ORDER BY Start_Date) < END_DATE
THEN COALESCE(DATEADD(s, -1, LEAD(Start_Date) OVER (ORDER BY Start_Date)), END_Date)
ELSE END_DATE END AS End_Date,
GROSS_SALES_PRICE
FROM @t
Or using a common table expression:
;WITH CTE
AS
(
SELECT Start_date,
End_Date,
LEAD(Start_Date) OVER (ORDER BY Start_Date) AS NextStartDate,
GROSS_SALES_PRICE
FROM @t
)
SELECT START_DATE,
CASE WHEN NextStartDate < END_DATE
THEN Coalesce(DATEADD(s, -1, NextStartDate), End_Date)
ELSE End_date END As End_Date,
GROSS_SALES_PRICE
FROM CTE
Updated to add missing row:
;WITH CTE
AS
(
SELECT Start_date,
End_Date,
LAG(END_Date) OVER (ORDER BY Start_Date) AS PreviousEndDate,
LEAD(Start_Date) OVER (ORDER BY Start_Date) AS NextStartDate,
GROSS_SALES_PRICE
FROM @t
)
SELECT START_DATE,
CASE WHEN NextStartDate < END_DATE
THEN Coalesce(DATEADD(s, -1, NextStartDate), End_Date)
ELSE End_date END As End_Date,
GROSS_SALES_PRICE
FROM CTE
UNION ALL
SELECT DATEADD(s, 1, PreviousEndDate), DATEADD(s, -1, Start_Date), GROSS_SALES_PRICE
FROM CTE
WHERE DATEDIFF(s, PreviousEndDate,Start_Date) > 1
ORDER BY 1
Upvotes: 0
Reputation: 32695
In addition to using datetime2
type instead of datetime
, I'd recommend you to use [Closed; Open)
intervals instead of [Closed; Closed]
. In other words, use 2014-08-06 00:00:00.000, 2014-09-06 00:00:00.000
instead of 2014-08-06 00:00:00.000, 2014-09-05 23:59:59.000
. Specifically, because 59.999
will be rounded to 00.000
for the datetime
type, but will not for datetime2(3)
. You don't want to depend on such internal details of the data types.
Also, [Closed; Open)
intervals are much easier to deal with in the queries as you'll see below.
The main idea is to put all start and end dates (boundaries) together in one list with a flag that indicates whether it is a beginning or end of the interval. When a running total of the flag turns into zero, it means that all overlapping intervals have ended.
Sample data
I extended your sample data with several cases of overlapping intervals.
declare @t table
(START_DATE datetime2(0),
END_DATE datetime2(0),
GROSS_SALES_PRICE decimal(10,2)
);
insert into @t
values
-- |------| 11
('2001-01-01 00:00:00', '2001-01-10 00:00:00', 11),
-- |------| 10
-- |------| 20
('2010-01-01 00:00:00', '2010-01-10 00:00:00', 10),
('2010-01-05 00:00:00', '2010-01-20 00:00:00', 20),
-- |----------| 30
-- |------| 40
('2010-02-01 00:00:00', '2010-02-20 00:00:00', 30),
('2010-02-05 00:00:00', '2010-02-20 00:00:00', 40),
-- |----------| 50
-- |----------| 60
('2010-03-01 00:00:00', '2010-03-20 00:00:00', 50),
('2010-03-01 00:00:00', '2010-03-20 00:00:00', 60),
-- |----------| 70
-- |------| 80
('2010-04-01 00:00:00', '2010-04-20 00:00:00', 70),
('2010-04-05 00:00:00', '2010-04-15 00:00:00', 80),
-- |-----------------------------| 29.99
-- |---------| 32.99
-- |---------| 32.99
-- |----------| 34.99
('2014-08-06 00:00:00', '2014-10-07 00:00:00', 29.99),
('2014-09-06 00:00:00', '2014-09-10 00:00:00', 32.99),
('2014-09-10 00:00:00', '2014-10-01 00:00:00', 32.99),
('2014-10-07 00:00:00', '2050-01-01 00:00:00', 34.99);
Query
WITH
CTE_Boundaries
AS
(
SELECT
START_DATE AS dt
,+1 AS Flag
,GROSS_SALES_PRICE AS Price
FROM @T
UNION ALL
SELECT
END_DATE AS dt
,-1 AS Flag
,GROSS_SALES_PRICE AS Price
FROM @T
)
,CTE_Intervals
AS
(
SELECT
dt
,Flag
,Price
,SUM(Flag) OVER (ORDER BY dt, Flag ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS SumFlag
,LEAD(dt) OVER (ORDER BY dt, Flag) AS NextDate
,LEAD(Price) OVER (ORDER BY dt, Flag) AS NextPrice
FROM CTE_Boundaries
)
SELECT
dt AS StartDate
,NextDate AS EndDate
,CASE WHEN Flag = 1 THEN Price ELSE NextPrice END AS Price
FROM CTE_Intervals
WHERE
SumFlag > 0
AND dt <> NextDate
ORDER BY StartDate
;
Result
+---------------------+---------------------+-------+
| StartDate | EndDate | Price |
+---------------------+---------------------+-------+
| 2001-01-01 00:00:00 | 2001-01-10 00:00:00 | 11.00 |
| 2010-01-01 00:00:00 | 2010-01-05 00:00:00 | 10.00 |
| 2010-01-05 00:00:00 | 2010-01-10 00:00:00 | 20.00 |
| 2010-01-10 00:00:00 | 2010-01-20 00:00:00 | 20.00 |
| 2010-02-01 00:00:00 | 2010-02-05 00:00:00 | 30.00 |
| 2010-02-05 00:00:00 | 2010-02-20 00:00:00 | 40.00 |
| 2010-03-01 00:00:00 | 2010-03-20 00:00:00 | 60.00 |
| 2010-04-01 00:00:00 | 2010-04-05 00:00:00 | 70.00 |
| 2010-04-05 00:00:00 | 2010-04-15 00:00:00 | 80.00 |
| 2010-04-15 00:00:00 | 2010-04-20 00:00:00 | 70.00 |
this is your sample data:
| 2014-08-06 00:00:00 | 2014-09-06 00:00:00 | 29.99 |
| 2014-09-06 00:00:00 | 2014-09-10 00:00:00 | 32.99 |
| 2014-09-10 00:00:00 | 2014-10-01 00:00:00 | 32.99 |
| 2014-10-01 00:00:00 | 2014-10-07 00:00:00 | 29.99 |
| 2014-10-07 00:00:00 | 2050-01-01 00:00:00 | 34.99 |
+---------------------+---------------------+-------+
Intermediary result of CTE_Intervals
Examine these to understand how the query works
+---------------------+------+-------+---------+---------------------+-----------+
| dt | Flag | Price | SumFlag | NextDate | NextPrice |
+---------------------+------+-------+---------+---------------------+-----------+
| 2001-01-01 00:00:00 | 1 | 11.00 | 1 | 2001-01-10 00:00:00 | 11.00 |
| 2001-01-10 00:00:00 | -1 | 11.00 | 0 | 2010-01-01 00:00:00 | 10.00 |
| 2010-01-01 00:00:00 | 1 | 10.00 | 1 | 2010-01-05 00:00:00 | 20.00 |
| 2010-01-05 00:00:00 | 1 | 20.00 | 2 | 2010-01-10 00:00:00 | 10.00 |
| 2010-01-10 00:00:00 | -1 | 10.00 | 1 | 2010-01-20 00:00:00 | 20.00 |
| 2010-01-20 00:00:00 | -1 | 20.00 | 0 | 2010-02-01 00:00:00 | 30.00 |
| 2010-02-01 00:00:00 | 1 | 30.00 | 1 | 2010-02-05 00:00:00 | 40.00 |
| 2010-02-05 00:00:00 | 1 | 40.00 | 2 | 2010-02-20 00:00:00 | 30.00 |
| 2010-02-20 00:00:00 | -1 | 30.00 | 1 | 2010-02-20 00:00:00 | 40.00 |
| 2010-02-20 00:00:00 | -1 | 40.00 | 0 | 2010-03-01 00:00:00 | 50.00 |
| 2010-03-01 00:00:00 | 1 | 50.00 | 1 | 2010-03-01 00:00:00 | 60.00 |
| 2010-03-01 00:00:00 | 1 | 60.00 | 2 | 2010-03-20 00:00:00 | 50.00 |
| 2010-03-20 00:00:00 | -1 | 50.00 | 1 | 2010-03-20 00:00:00 | 60.00 |
| 2010-03-20 00:00:00 | -1 | 60.00 | 0 | 2010-04-01 00:00:00 | 70.00 |
| 2010-04-01 00:00:00 | 1 | 70.00 | 1 | 2010-04-05 00:00:00 | 80.00 |
| 2010-04-05 00:00:00 | 1 | 80.00 | 2 | 2010-04-15 00:00:00 | 80.00 |
| 2010-04-15 00:00:00 | -1 | 80.00 | 1 | 2010-04-20 00:00:00 | 70.00 |
| 2010-04-20 00:00:00 | -1 | 70.00 | 0 | 2014-08-06 00:00:00 | 29.99 |
| 2014-08-06 00:00:00 | 1 | 29.99 | 1 | 2014-09-06 00:00:00 | 32.99 |
| 2014-09-06 00:00:00 | 1 | 32.99 | 2 | 2014-09-10 00:00:00 | 32.99 |
| 2014-09-10 00:00:00 | -1 | 32.99 | 1 | 2014-09-10 00:00:00 | 32.99 |
| 2014-09-10 00:00:00 | 1 | 32.99 | 2 | 2014-10-01 00:00:00 | 32.99 |
| 2014-10-01 00:00:00 | -1 | 32.99 | 1 | 2014-10-07 00:00:00 | 29.99 |
| 2014-10-07 00:00:00 | -1 | 29.99 | 0 | 2014-10-07 00:00:00 | 34.99 |
| 2014-10-07 00:00:00 | 1 | 34.99 | 1 | 2050-01-01 00:00:00 | 34.99 |
| 2050-01-01 00:00:00 | -1 | 34.99 | 0 | NULL | NULL |
+---------------------+------+-------+---------+---------------------+-----------+
Upvotes: 1
Reputation: 38033
A calendar/dates table can simplify this, but we can also use a query to generate a temporary dates table using a common table expression.
From there, we can solve this as a gaps and islands style problem. Using the dates table and using outer apply()
to get the latest values for start_date
and gross_sales_price
we can identify the groups we want to re-aggregate by using two row_number()
s. The first just ordered by date
, less the other that is partitioned by the value we have as the latest start_date
and ordered by date
.
Then you can dump the results of the common table expression src
to a temporary table and do your inserts/deletes using that or you can use merge
using src
.
/* -- dates --*/
declare @fromdate datetime, @thrudate datetime;
select @fromdate = min(start_date), @thrudate = max(end_date) from #t;
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, dates as (
select top (datediff(day, @fromdate, @thrudate)+1)
[Date]=convert(datetime,dateadd(day,row_number() over(order by (select 1))-1,@fromdate))
, [End_Date]=convert(datetime,dateadd(millisecond,-3,dateadd(day,row_number() over(order by (select 1)),@fromdate)))
from n as deka cross join n as hecto cross join n as kilo
cross join n as tenK cross join n as hundredK
order by [Date]
)
/* -- islands -- */
, cte as (
select
start_date = d.date
, end_date = d.end_date
, x.gross_sales_price
, grp = row_number() over (order by d.date)
- row_number() over (partition by x.start_date order by d.date)
from dates d
outer apply (
select top 1 l.start_date, l.gross_sales_price
from #t l
where d.date >= l.start_date
and d.date <= l.end_date
order by l.start_date desc
) x
)
/* -- aggregated islands -- */
, src as (
select
start_date = min(start_date)
, end_date = max(end_date)
, gross_sales_price
from cte
group by gross_sales_price, grp
)
/* -- merge -- */
merge #t with (holdlock) as target
using src as source
on target.start_date = source.start_date
and target.end_date = source.end_date
and target.gross_sales_price = source.gross_sales_price
when not matched by target
then insert (start_date, end_date, gross_sales_price)
values (start_date, end_date, gross_sales_price)
when not matched by source
then delete
output $action, inserted.*, deleted.*;
/* -- results -- */
select
start_date
, end_date
, gross_sales_price
from #t
order by start_date
rextester demo: http://rextester.com/MFXCQQ90933
merge
output (you do not need to output this, just showing for the demo):
+---------+---------------------+---------------------+-------------------+---------------------+---------------------+-------------------+
| $action | START_DATE | END_DATE | GROSS_SALES_PRICE | START_DATE | END_DATE | GROSS_SALES_PRICE |
+---------+---------------------+---------------------+-------------------+---------------------+---------------------+-------------------+
| INSERT | 2014-10-01 00:00:00 | 2014-10-06 23:59:59 | 29.99 | NULL | NULL | NULL |
| INSERT | 2014-08-06 00:00:00 | 2014-09-05 23:59:59 | 29.99 | NULL | NULL | NULL |
| DELETE | NULL | NULL | NULL | 2014-08-06 00:00:00 | 2014-10-06 23:59:59 | 29.99 |
+---------+---------------------+---------------------+-------------------+---------------------+---------------------+-------------------+
results:
+-------------------------+-------------------------+-------------------+
| start_date | end_date | gross_sales_price |
+-------------------------+-------------------------+-------------------+
| 2014-08-06 00:00:00.000 | 2014-09-05 23:59:59.997 | 29.99 |
| 2014-09-06 00:00:00.000 | 2014-09-09 23:59:59.997 | 32.99 |
| 2014-09-10 00:00:00.000 | 2014-09-30 23:59:59.997 | 32.99 |
| 2014-10-01 00:00:00.000 | 2014-10-06 23:59:59.997 | 29.99 |
| 2014-10-07 00:00:00.000 | 2049-12-31 23:59:59.997 | 34.99 |
+-------------------------+-------------------------+-------------------+
calendar and numbers tables reference:
merge
reference:
MERGE
Statement - Aaron BertrandMerge
- Dan GuzmanMERGE
Bug - Paul Whitemerge
statement - Aaron BertrandMERGE
, please read this! - Aaron BertrandMerge
Statement (LCK_M_RS_U locks) - Kendra Littlemerge
statements the right way - David SteinUpvotes: 3
Reputation: 850
This can be solved with simple joins and unions. However better with an ID. The common table expression is only to add an ID.
declare @t table(START_DATE datetime,END_DATE datetime, GROSS_SALES_PRICE
decimal(10,2));
insert into @t values
( '2014-08-06 00:00:00.000', '2014-10-06 23:59:59.000', 29.99),
( '2014-09-06 00:00:00.000', '2014-09-09 23:59:59.000', 32.99),
( '2014-09-10 00:00:00.000', '2014-09-30 23:59:59.000', 32.99),
( '2014-10-07 00:00:00.000', '2049-12-31 23:59:59.000', 34.99)
;with t_cte as
(select row_number() over( order by start_date,end_date,GROSS_SALES_PRICE) ID,*
from @t
)
select t1.start_date,min(t2.start_date),t1.GROSS_SALES_PRICE
from t_cte t1
join t_cte t2 on t1.END_DATE > t2.START_DATE and t1.END_DATE> t2.START_DATE and t1.id< t2.id
group by t1.START_DATE,t1.END_DATE,t1.GROSS_SALES_PRICE
union all
select min(t2.start_date),t1.end_date,t1.GROSS_SALES_PRICE
from t_cte t1
join t_cte t2 on t1.END_DATE > t2.START_DATE and t1.END_DATE> t2.START_DATE and t1.id< t2.id
group by t1.START_DATE,t1.END_DATE,t1.GROSS_SALES_PRICE
union all
select t1.start_date,t1.END_DATE,t1.GROSS_SALES_PRICE
from t_cte t1
left join t_cte t2 on t1.END_DATE > t2.START_DATE and t1.END_DATE> t2.START_DATE and t1.id< t2.id
where t2.id is null
order by 1,2,3
Upvotes: 0
Reputation: 1
Note: following solution comes with few assumptions
[1] It's using LEAD function => SQL2012+
[2] All DATETIME columns are mandatory => NOT NULL
[3] All DATETIME values (across both columns) are unique.
select y.*
from (
select t.ID, x.DT AS NEW_START_DATE, DATEADD(MILLISECOND, -3, LEAD(x.DT) OVER(ORDER BY x.DT ASC)) AS NEW_END_DATE
from @t as t
outer apply (
select t.START_DATE, 1
union all
select t.END_DATE, 2
) as x(DT, [TYPE])
) as y
where y.NEW_END_DATE IS NOT NULL
order by y.NEW_START_DATE
Upvotes: 0