Alexander
Alexander

Reputation: 13

Gaps and Islands with start en end date (ValidPeriod)

I have searched but can not find the solution to the following problem.

I have several price lists with a couple of million rows, and I have found many examples which can be aggregated into one row because the start and end dates of the group are consecutive (enddate:20151231 next startdate: 20160101)

But I also found many gaps, meaning the straight forward approach using min() and max() function is not applicable as possible gaps will be ignored.

The following contains a #Prices table with example records and a #Target table with the result I am shooting for:

Thanks.

My definition of a gap is when two consecutive records have more than 1 day between them.

if object_id('tempdb..#Prices', 'table') is not null
    drop table #Prices
;

create table #Prices (
    Product         varchar(50) not null
  , Value           decimal(18,5) not null
  , ValidFrom       date not null
  , ValidTo         date null
)

insert into #Prices
(
    Product
  , Value
  , ValidFrom   
  , ValidTo    
)
select
    Product          = 'Island A'
  , Value            = 10.10
  , ValidFrom        = '20140101'
  , ValidTo          = '20140606'
union all
select
    Product          = 'Island A'
  , Value            = 10.10
  , ValidFrom        = '20140607'
  , ValidTo          = '20141010'
union all
select
    Product          = 'Island A'
  , Value            = 10.11
  , ValidFrom        = '20141011'
  , ValidTo          = '20141231'
union all
select
    Product          = 'Island A'
  , Value            = 11.10
  , ValidFrom        = '20150101'
  , ValidTo          = '20151231'
union all
select
    Product          = 'Island A'
  , Value            = 10.10
  , ValidFrom        = '20160101'
  , ValidTo          = null
union all
select
    Product          = 'Gap B'
  , Value            = 20.10
  , ValidFrom        = '20140101'
  , ValidTo          = '20140606'
union all
select
    Product          = 'Gap B'
  , Value            = 20.10
  , ValidFrom        = '20140607'
  , ValidTo          = '20141010'
union all
select
    Product          = 'Gap B'
  , Value            = 20.10
  , ValidFrom        = '20150101'
  , ValidTo          = '20151231'
union all
select
    Product          = 'Gap B'
  , Value            = 20.10
  , ValidFrom        = '20160101'
  , ValidTo          = null

select * 
from #Prices as P
order by P.Product, P.ValidFrom
;


if object_id('tempdb..#Target', 'table') is not null
    drop table #Target
;

create table #Target (
    Product         varchar(50) not null
  , Value           decimal(18,5) not null
  , ValidFrom       date not null
  , ValidTo         date null
)

insert into #Target
(
    Product
  , Value
  , ValidFrom   
  , ValidTo    
)
select
    Product          = 'Island A'
  , Value            = 10.10
  , ValidFrom        = '20140101'
  , ValidTo          = '20141010'
union all
select
    Product          = 'Island A'
  , Value            = 10.11
  , ValidFrom        = '20141011'
  , ValidTo          = '20141231'
union all
select
    Product          = 'Island A'
  , Value            = 11.10
  , ValidFrom        = '20150101'
  , ValidTo          = '20151231'
union all
select
    Product          = 'Island A'
  , Value            = 10.10
  , ValidFrom        = '20160101'
  , ValidTo          = null
union all
select
    Product          = 'Gap B'
  , Value            = 20.10
  , ValidFrom        = '20140101'
  , ValidTo          = '20141010'
union all
select
    Product          = 'Gap B'
  , Value            = 20.10
  , ValidFrom        = '20150101'
  , ValidTo          = null

select * 
from #Target as P
order by P.Product, P.ValidFrom
;

EDIT I hope the edit is the answer to your question. The records which are consecutive (max 1 day between the records) can be aggregated by taking the min(ValidFrom) and the max(ValidTo). the problem is with the gaps, these will be ignored. The result for Product 'Gap B' will then be one record. Any hit on this record with a Date will get the value 20.10 even when the date is in the period of the Gap.

Gap B    |    20.10    |    20140101    |     null

Therefore I need 2 records so all the joins on the table will result in the correct value and no value in the period of the Gap

Gap B    |    20.10    |    20140101    |     20141010
Gap B    |    20.10    |    20151231    |     null

Upvotes: 1

Views: 1029

Answers (2)

John Cappelletti
John Cappelletti

Reputation: 81930

Had to bake in a little logic for the NULL ValidTo's

;with cte0(N)   as (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N))
     ,cte1(R,D) as (Select Row_Number() over (Order By (Select Null))
                          ,DateAdd(DD,Row_Number() over (Order By (Select Null)) -1,(Select min(ValidFrom) From  #Prices)) 
                     From  cte0 N1, cte0 N2, cte0 N3, cte0 N4) 
Select Product
      ,Value    
      ,ValidFrom = Min(ValidFrom)
      ,ValidTo   = nullif(max(isnull(ValidTo,'2099-12-31')),'2099-12-31')
 From (
         Select *
               ,Island = R - Row_Number() over (Partition By Product,Value Order by ValidFrom)
          From  #Prices  A
          Join  cte1     B on D Between ValidFrom and IsNull(ValidTo,'2099-12-31')
      ) A
 Group By Product,Value,Island
 Order By 1 Desc,3

Returns

Product     Value       ValidFrom   ValidTo
Island A    10.10000    2014-01-01  2014-10-10
Island A    10.11000    2014-10-11  2014-12-31
Island A    11.10000    2015-01-01  2015-12-31
Island A    10.10000    2016-01-01  NULL
Gap B       20.10000    2014-01-01  2014-10-10
Gap B       20.10000    2015-01-01  NULL

Upvotes: 1

iamdave
iamdave

Reputation: 12243

Here is a different solution that uses a recursive cte, which I think is a little easier to understand compared to Jon's. On this amount of data it is also a lot more efficient, though you will need to test the performance yourself for a larger dataset:

;with rownum
as
(
    select row_number() over (order by Product, ValidFrom) as rn
            ,Product
            ,Value
            ,ValidFrom
            ,ValidTo
    from #Prices
)
,cte
as
(
    select rn
            ,Product
            ,Value
            ,ValidFrom
            ,ValidFrom as ValidFrom2
            ,ValidTo
    from rownum
    where rn = 1

    union all

    select r.rn
            ,r.Product
            ,r.Value

            ,r.ValidFrom
            ,case when c.Product = r.Product
                    then case when dateadd(d,1,c.ValidTo) = r.ValidFrom
                            then c.ValidFrom
                            else r.ValidFrom
                            end
                    else r.ValidFrom
                    end as ValidFrom2

            ,isnull(r.ValidTo,'29990101') as ValidTo
    from rownum r
        inner join cte c
            on(r.rn = c.rn+1)
)
select Product
        ,Value
        ,ValidFrom2 as ValidFrom
        ,nullif(max(ValidTo),'29990101') as ValidTo
from cte
group by Product
        ,Value
        ,ValidFrom2
order by Product
        ,ValidFrom2;

Upvotes: 2

Related Questions