Rafał Kobiela
Rafał Kobiela

Reputation: 109

Divide rows with date in SQL Server 2014

I have a problem with SQL. I have the following table:

declare @t table (START_DATE datetime,
                  END_DATE datetime, 
                  GROSS_SALES_PRICE decimal(10,2)
                 );

insert into @t 
values ('2014-08-06 00:00:00.000', '2014-10-06 23:59:59.000', 29.99),
       ('2014-09-06 00:00:00.000', '2014-09-09 23:59:59.000', 32.99),
       ('2014-09-10 00:00:00.000', '2014-09-30 23:59:59.000', 32.99),
       ('2014-10-07 00:00:00.000', '2049-12-31 23:59:59.000', 34.99)

I would like to separate the dates which overlaps. For example I have in the first row START_DATE 2014-08-06 and END_DATE 2014-10-06. We can see that the dates from the second and the third row are inside this period of time from first row.

So I would like to separate them as follows:

declare @t2 table (START_DATE datetime,
                   END_DATE datetime, 
                   GROSS_SALES_PRICE decimal(10,2)
                  );

insert into @t2 
values ('2014-08-06 00:00:00.000', '2014-09-05 23:59:59.000', 29.99),
       ('2014-09-06 00:00:00.000', '2014-09-09 23:59:59.000', 32.99),
       ('2014-09-10 00:00:00.000', '2014-09-30 23:59:59.000', 32.99),
       ('2014-10-01 00:00:00.000', '2014-10-06 23:59:59.000', 29.99),
       ('2014-10-07 00:00:00.000', '2049-12-31 23:59:59.000', 34.99)

So the second and the third rows remained unchanged. The first row should have new END_DATE. We also have new row. The GROSS_SALES_PRICE should remain as it is in internal period. Thanks for help. I am using SQL Server 2014

Upvotes: 6

Views: 174

Answers (5)

Steve Ford
Steve Ford

Reputation: 7753

How about using Lead to find the value from the next row:

SELECT START_DATE, 
    CASE 
        WHEN LEAD(Start_Date) OVER (ORDER BY Start_Date) < END_DATE 
        THEN COALESCE(DATEADD(s, -1, LEAD(Start_Date) OVER (ORDER BY Start_Date)), END_Date)
        ELSE END_DATE END AS End_Date,
    GROSS_SALES_PRICE
 FROM @t

Or using a common table expression:

;WITH CTE
 AS
 (
    SELECT Start_date,
           End_Date,
           LEAD(Start_Date) OVER (ORDER BY Start_Date) AS NextStartDate,
           GROSS_SALES_PRICE
    FROM @t
)
SELECT START_DATE,
        CASE WHEN NextStartDate < END_DATE 
            THEN Coalesce(DATEADD(s, -1, NextStartDate), End_Date) 
            ELSE End_date END As End_Date,
        GROSS_SALES_PRICE
FROM CTE

Updated to add missing row:

;WITH CTE
 AS
 (
    SELECT Start_date,
           End_Date,
           LAG(END_Date) OVER (ORDER BY Start_Date) AS PreviousEndDate,
           LEAD(Start_Date) OVER (ORDER BY Start_Date) AS NextStartDate,
           GROSS_SALES_PRICE
    FROM @t
)
SELECT START_DATE,
        CASE WHEN NextStartDate < END_DATE 
            THEN Coalesce(DATEADD(s, -1, NextStartDate), End_Date) 
            ELSE End_date END As End_Date,
        GROSS_SALES_PRICE
FROM CTE
UNION ALL
SELECT DATEADD(s, 1, PreviousEndDate), DATEADD(s, -1, Start_Date), GROSS_SALES_PRICE
FROM CTE
WHERE DATEDIFF(s, PreviousEndDate,Start_Date) > 1
ORDER BY 1

Upvotes: 0

Vladimir Baranov
Vladimir Baranov

Reputation: 32695

In addition to using datetime2 type instead of datetime, I'd recommend you to use [Closed; Open) intervals instead of [Closed; Closed]. In other words, use 2014-08-06 00:00:00.000, 2014-09-06 00:00:00.000 instead of 2014-08-06 00:00:00.000, 2014-09-05 23:59:59.000. Specifically, because 59.999 will be rounded to 00.000 for the datetime type, but will not for datetime2(3). You don't want to depend on such internal details of the data types.

Also, [Closed; Open) intervals are much easier to deal with in the queries as you'll see below.

The main idea is to put all start and end dates (boundaries) together in one list with a flag that indicates whether it is a beginning or end of the interval. When a running total of the flag turns into zero, it means that all overlapping intervals have ended.

Sample data

I extended your sample data with several cases of overlapping intervals.

declare @t table 
(START_DATE datetime2(0),
END_DATE datetime2(0), 
GROSS_SALES_PRICE decimal(10,2)
);

insert into @t 
values 
-- |------|                                    11
('2001-01-01 00:00:00', '2001-01-10 00:00:00', 11),

-- |------|                                    10
--     |------|                                20
('2010-01-01 00:00:00', '2010-01-10 00:00:00', 10),
('2010-01-05 00:00:00', '2010-01-20 00:00:00', 20),

-- |----------|                                30
--     |------|                                40
('2010-02-01 00:00:00', '2010-02-20 00:00:00', 30),
('2010-02-05 00:00:00', '2010-02-20 00:00:00', 40),

-- |----------|                                50
-- |----------|                                60
('2010-03-01 00:00:00', '2010-03-20 00:00:00', 50),
('2010-03-01 00:00:00', '2010-03-20 00:00:00', 60),

-- |----------|                                70
--   |------|                                  80
('2010-04-01 00:00:00', '2010-04-20 00:00:00', 70),
('2010-04-05 00:00:00', '2010-04-15 00:00:00', 80),

-- |-----------------------------|             29.99
--      |---------|                            32.99
--                |---------|                  32.99
--                               |----------|  34.99
('2014-08-06 00:00:00', '2014-10-07 00:00:00', 29.99),
('2014-09-06 00:00:00', '2014-09-10 00:00:00', 32.99),
('2014-09-10 00:00:00', '2014-10-01 00:00:00', 32.99),
('2014-10-07 00:00:00', '2050-01-01 00:00:00', 34.99);

Query

WITH
CTE_Boundaries
AS
(
    SELECT
        START_DATE AS dt
        ,+1 AS Flag
        ,GROSS_SALES_PRICE AS Price
    FROM @T

    UNION ALL

    SELECT
        END_DATE AS dt
        ,-1 AS Flag
        ,GROSS_SALES_PRICE AS Price
    FROM @T
)
,CTE_Intervals
AS
(
    SELECT
        dt
        ,Flag
        ,Price
        ,SUM(Flag) OVER (ORDER BY dt, Flag ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS SumFlag
        ,LEAD(dt) OVER (ORDER BY dt, Flag) AS NextDate
        ,LEAD(Price) OVER (ORDER BY dt, Flag) AS NextPrice
    FROM CTE_Boundaries
)
SELECT
    dt AS StartDate
    ,NextDate AS EndDate
    ,CASE WHEN Flag = 1 THEN Price ELSE NextPrice END AS Price
FROM CTE_Intervals
WHERE
    SumFlag > 0
    AND dt <> NextDate
ORDER BY StartDate
;

Result

+---------------------+---------------------+-------+
|      StartDate      |       EndDate       | Price |
+---------------------+---------------------+-------+
| 2001-01-01 00:00:00 | 2001-01-10 00:00:00 | 11.00 |
| 2010-01-01 00:00:00 | 2010-01-05 00:00:00 | 10.00 |
| 2010-01-05 00:00:00 | 2010-01-10 00:00:00 | 20.00 |
| 2010-01-10 00:00:00 | 2010-01-20 00:00:00 | 20.00 |
| 2010-02-01 00:00:00 | 2010-02-05 00:00:00 | 30.00 |
| 2010-02-05 00:00:00 | 2010-02-20 00:00:00 | 40.00 |
| 2010-03-01 00:00:00 | 2010-03-20 00:00:00 | 60.00 |
| 2010-04-01 00:00:00 | 2010-04-05 00:00:00 | 70.00 |
| 2010-04-05 00:00:00 | 2010-04-15 00:00:00 | 80.00 |
| 2010-04-15 00:00:00 | 2010-04-20 00:00:00 | 70.00 |

this is your sample data:

| 2014-08-06 00:00:00 | 2014-09-06 00:00:00 | 29.99 |
| 2014-09-06 00:00:00 | 2014-09-10 00:00:00 | 32.99 |
| 2014-09-10 00:00:00 | 2014-10-01 00:00:00 | 32.99 |
| 2014-10-01 00:00:00 | 2014-10-07 00:00:00 | 29.99 |
| 2014-10-07 00:00:00 | 2050-01-01 00:00:00 | 34.99 |
+---------------------+---------------------+-------+

Intermediary result of CTE_Intervals

Examine these to understand how the query works

+---------------------+------+-------+---------+---------------------+-----------+
|         dt          | Flag | Price | SumFlag |      NextDate       | NextPrice |
+---------------------+------+-------+---------+---------------------+-----------+
| 2001-01-01 00:00:00 |    1 | 11.00 |       1 | 2001-01-10 00:00:00 | 11.00     |
| 2001-01-10 00:00:00 |   -1 | 11.00 |       0 | 2010-01-01 00:00:00 | 10.00     |
| 2010-01-01 00:00:00 |    1 | 10.00 |       1 | 2010-01-05 00:00:00 | 20.00     |
| 2010-01-05 00:00:00 |    1 | 20.00 |       2 | 2010-01-10 00:00:00 | 10.00     |
| 2010-01-10 00:00:00 |   -1 | 10.00 |       1 | 2010-01-20 00:00:00 | 20.00     |
| 2010-01-20 00:00:00 |   -1 | 20.00 |       0 | 2010-02-01 00:00:00 | 30.00     |
| 2010-02-01 00:00:00 |    1 | 30.00 |       1 | 2010-02-05 00:00:00 | 40.00     |
| 2010-02-05 00:00:00 |    1 | 40.00 |       2 | 2010-02-20 00:00:00 | 30.00     |
| 2010-02-20 00:00:00 |   -1 | 30.00 |       1 | 2010-02-20 00:00:00 | 40.00     |
| 2010-02-20 00:00:00 |   -1 | 40.00 |       0 | 2010-03-01 00:00:00 | 50.00     |
| 2010-03-01 00:00:00 |    1 | 50.00 |       1 | 2010-03-01 00:00:00 | 60.00     |
| 2010-03-01 00:00:00 |    1 | 60.00 |       2 | 2010-03-20 00:00:00 | 50.00     |
| 2010-03-20 00:00:00 |   -1 | 50.00 |       1 | 2010-03-20 00:00:00 | 60.00     |
| 2010-03-20 00:00:00 |   -1 | 60.00 |       0 | 2010-04-01 00:00:00 | 70.00     |
| 2010-04-01 00:00:00 |    1 | 70.00 |       1 | 2010-04-05 00:00:00 | 80.00     |
| 2010-04-05 00:00:00 |    1 | 80.00 |       2 | 2010-04-15 00:00:00 | 80.00     |
| 2010-04-15 00:00:00 |   -1 | 80.00 |       1 | 2010-04-20 00:00:00 | 70.00     |
| 2010-04-20 00:00:00 |   -1 | 70.00 |       0 | 2014-08-06 00:00:00 | 29.99     |
| 2014-08-06 00:00:00 |    1 | 29.99 |       1 | 2014-09-06 00:00:00 | 32.99     |
| 2014-09-06 00:00:00 |    1 | 32.99 |       2 | 2014-09-10 00:00:00 | 32.99     |
| 2014-09-10 00:00:00 |   -1 | 32.99 |       1 | 2014-09-10 00:00:00 | 32.99     |
| 2014-09-10 00:00:00 |    1 | 32.99 |       2 | 2014-10-01 00:00:00 | 32.99     |
| 2014-10-01 00:00:00 |   -1 | 32.99 |       1 | 2014-10-07 00:00:00 | 29.99     |
| 2014-10-07 00:00:00 |   -1 | 29.99 |       0 | 2014-10-07 00:00:00 | 34.99     |
| 2014-10-07 00:00:00 |    1 | 34.99 |       1 | 2050-01-01 00:00:00 | 34.99     |
| 2050-01-01 00:00:00 |   -1 | 34.99 |       0 | NULL                | NULL      |
+---------------------+------+-------+---------+---------------------+-----------+

Upvotes: 1

SqlZim
SqlZim

Reputation: 38033

A calendar/dates table can simplify this, but we can also use a query to generate a temporary dates table using a common table expression.

From there, we can solve this as a gaps and islands style problem. Using the dates table and using outer apply() to get the latest values for start_date and gross_sales_price we can identify the groups we want to re-aggregate by using two row_number()s. The first just ordered by date, less the other that is partitioned by the value we have as the latest start_date and ordered by date.

Then you can dump the results of the common table expression src to a temporary table and do your inserts/deletes using that or you can use merge using src.

/* -- dates --*/
declare @fromdate datetime, @thrudate datetime;
select  @fromdate = min(start_date), @thrudate = max(end_date) from #t;
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, dates as (
  select top (datediff(day, @fromdate, @thrudate)+1) 
      [Date]=convert(datetime,dateadd(day,row_number() over(order by (select 1))-1,@fromdate))
    , [End_Date]=convert(datetime,dateadd(millisecond,-3,dateadd(day,row_number() over(order by (select 1)),@fromdate)))
  from n as deka cross join n as hecto cross join n as kilo
                 cross join n as tenK cross join n as hundredK
   order by [Date]
)
/* -- islands -- */
, cte as (
select 
    start_date = d.date
  , end_date   = d.end_date
  , x.gross_sales_price
  , grp = row_number() over (order by d.date)
        - row_number() over (partition by x.start_date order by d.date)
from dates d
  outer apply (
    select top 1 l.start_date, l.gross_sales_price
    from #t l
    where d.date >= l.start_date
      and d.date <= l.end_date
    order by l.start_date desc
    ) x
)
/* -- aggregated islands -- */
, src as (
select 
    start_date = min(start_date)
  , end_date   = max(end_date)  
  , gross_sales_price
from cte
group by gross_sales_price, grp
)
/* -- merge -- */
merge #t with (holdlock) as target
using src as source
  on target.start_date = source.start_date
 and target.end_date   = source.end_date
 and target.gross_sales_price = source.gross_sales_price
when not matched by target 
  then insert (start_date, end_date, gross_sales_price)
    values (start_date, end_date, gross_sales_price)
when not matched by source 
  then delete
output $action, inserted.*, deleted.*;
/* -- results -- */
select 
    start_date
  , end_date  
  , gross_sales_price
from #t 
order by start_date

rextester demo: http://rextester.com/MFXCQQ90933

merge output (you do not need to output this, just showing for the demo):

+---------+---------------------+---------------------+-------------------+---------------------+---------------------+-------------------+
| $action |     START_DATE      |      END_DATE       | GROSS_SALES_PRICE |     START_DATE      |      END_DATE       | GROSS_SALES_PRICE |
+---------+---------------------+---------------------+-------------------+---------------------+---------------------+-------------------+
| INSERT  | 2014-10-01 00:00:00 | 2014-10-06 23:59:59 | 29.99             | NULL                | NULL                | NULL              |
| INSERT  | 2014-08-06 00:00:00 | 2014-09-05 23:59:59 | 29.99             | NULL                | NULL                | NULL              |
| DELETE  | NULL                | NULL                | NULL              | 2014-08-06 00:00:00 | 2014-10-06 23:59:59 | 29.99             |
+---------+---------------------+---------------------+-------------------+---------------------+---------------------+-------------------+

results:

+-------------------------+-------------------------+-------------------+
|       start_date        |        end_date         | gross_sales_price |
+-------------------------+-------------------------+-------------------+
| 2014-08-06 00:00:00.000 | 2014-09-05 23:59:59.997 | 29.99             |
| 2014-09-06 00:00:00.000 | 2014-09-09 23:59:59.997 | 32.99             |
| 2014-09-10 00:00:00.000 | 2014-09-30 23:59:59.997 | 32.99             |
| 2014-10-01 00:00:00.000 | 2014-10-06 23:59:59.997 | 29.99             |
| 2014-10-07 00:00:00.000 | 2049-12-31 23:59:59.997 | 34.99             |
+-------------------------+-------------------------+-------------------+

calendar and numbers tables reference:

merge reference:

Upvotes: 3

Peter
Peter

Reputation: 850

This can be solved with simple joins and unions. However better with an ID. The common table expression is only to add an ID.

declare @t table(START_DATE datetime,END_DATE datetime, GROSS_SALES_PRICE 
 decimal(10,2));
insert into @t values
 ( '2014-08-06 00:00:00.000',   '2014-10-06 23:59:59.000',  29.99),
 ( '2014-09-06 00:00:00.000',   '2014-09-09 23:59:59.000',  32.99),
 ( '2014-09-10 00:00:00.000',   '2014-09-30 23:59:59.000',  32.99),
 ( '2014-10-07 00:00:00.000',   '2049-12-31 23:59:59.000',  34.99)

;with t_cte as
(select row_number() over( order by start_date,end_date,GROSS_SALES_PRICE) ID,*
from @t
)

select t1.start_date,min(t2.start_date),t1.GROSS_SALES_PRICE
from t_cte t1
join t_cte t2 on t1.END_DATE > t2.START_DATE and t1.END_DATE> t2.START_DATE and t1.id< t2.id
group by t1.START_DATE,t1.END_DATE,t1.GROSS_SALES_PRICE
union all
select min(t2.start_date),t1.end_date,t1.GROSS_SALES_PRICE
from t_cte t1
join t_cte t2 on t1.END_DATE > t2.START_DATE and t1.END_DATE> t2.START_DATE and t1.id< t2.id
group by t1.START_DATE,t1.END_DATE,t1.GROSS_SALES_PRICE
union all
select t1.start_date,t1.END_DATE,t1.GROSS_SALES_PRICE
from t_cte t1
left join t_cte t2 on t1.END_DATE > t2.START_DATE and t1.END_DATE> t2.START_DATE and t1.id< t2.id
where t2.id is null
order by 1,2,3

Upvotes: 0

Bogdan Sahlean
Bogdan Sahlean

Reputation: 1

Note: following solution comes with few assumptions

[1] It's using LEAD function => SQL2012+

[2] All DATETIME columns are mandatory => NOT NULL

[3] All DATETIME values (across both columns) are unique.

select y.*
from (
    select t.ID, x.DT AS NEW_START_DATE, DATEADD(MILLISECOND, -3, LEAD(x.DT) OVER(ORDER BY x.DT ASC)) AS NEW_END_DATE
    from @t as t
    outer apply (
        select t.START_DATE, 1
        union all
        select t.END_DATE, 2 
    ) as x(DT, [TYPE])
) as y
where y.NEW_END_DATE IS NOT NULL
order by y.NEW_START_DATE

Upvotes: 0

Related Questions