Reputation: 13
I have searched but can not find the solution to the following problem.
I have several price lists with a couple of million rows, and I have found many examples which can be aggregated into one row because the start and end dates of the group are consecutive (enddate:20151231 next startdate: 20160101)
But I also found many gaps, meaning the straight forward approach using min() and max() function is not applicable as possible gaps will be ignored.
The following contains a #Prices table with example records and a #Target table with the result I am shooting for:
Thanks.
My definition of a gap is when two consecutive records have more than 1 day between them.
if object_id('tempdb..#Prices', 'table') is not null
drop table #Prices
;
create table #Prices (
Product varchar(50) not null
, Value decimal(18,5) not null
, ValidFrom date not null
, ValidTo date null
)
insert into #Prices
(
Product
, Value
, ValidFrom
, ValidTo
)
select
Product = 'Island A'
, Value = 10.10
, ValidFrom = '20140101'
, ValidTo = '20140606'
union all
select
Product = 'Island A'
, Value = 10.10
, ValidFrom = '20140607'
, ValidTo = '20141010'
union all
select
Product = 'Island A'
, Value = 10.11
, ValidFrom = '20141011'
, ValidTo = '20141231'
union all
select
Product = 'Island A'
, Value = 11.10
, ValidFrom = '20150101'
, ValidTo = '20151231'
union all
select
Product = 'Island A'
, Value = 10.10
, ValidFrom = '20160101'
, ValidTo = null
union all
select
Product = 'Gap B'
, Value = 20.10
, ValidFrom = '20140101'
, ValidTo = '20140606'
union all
select
Product = 'Gap B'
, Value = 20.10
, ValidFrom = '20140607'
, ValidTo = '20141010'
union all
select
Product = 'Gap B'
, Value = 20.10
, ValidFrom = '20150101'
, ValidTo = '20151231'
union all
select
Product = 'Gap B'
, Value = 20.10
, ValidFrom = '20160101'
, ValidTo = null
select *
from #Prices as P
order by P.Product, P.ValidFrom
;
if object_id('tempdb..#Target', 'table') is not null
drop table #Target
;
create table #Target (
Product varchar(50) not null
, Value decimal(18,5) not null
, ValidFrom date not null
, ValidTo date null
)
insert into #Target
(
Product
, Value
, ValidFrom
, ValidTo
)
select
Product = 'Island A'
, Value = 10.10
, ValidFrom = '20140101'
, ValidTo = '20141010'
union all
select
Product = 'Island A'
, Value = 10.11
, ValidFrom = '20141011'
, ValidTo = '20141231'
union all
select
Product = 'Island A'
, Value = 11.10
, ValidFrom = '20150101'
, ValidTo = '20151231'
union all
select
Product = 'Island A'
, Value = 10.10
, ValidFrom = '20160101'
, ValidTo = null
union all
select
Product = 'Gap B'
, Value = 20.10
, ValidFrom = '20140101'
, ValidTo = '20141010'
union all
select
Product = 'Gap B'
, Value = 20.10
, ValidFrom = '20150101'
, ValidTo = null
select *
from #Target as P
order by P.Product, P.ValidFrom
;
EDIT I hope the edit is the answer to your question. The records which are consecutive (max 1 day between the records) can be aggregated by taking the min(ValidFrom) and the max(ValidTo). the problem is with the gaps, these will be ignored. The result for Product 'Gap B' will then be one record. Any hit on this record with a Date will get the value 20.10 even when the date is in the period of the Gap.
Gap B | 20.10 | 20140101 | null
Therefore I need 2 records so all the joins on the table will result in the correct value and no value in the period of the Gap
Gap B | 20.10 | 20140101 | 20141010
Gap B | 20.10 | 20151231 | null
Upvotes: 1
Views: 1029
Reputation: 81930
Had to bake in a little logic for the NULL ValidTo's
;with cte0(N) as (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N))
,cte1(R,D) as (Select Row_Number() over (Order By (Select Null))
,DateAdd(DD,Row_Number() over (Order By (Select Null)) -1,(Select min(ValidFrom) From #Prices))
From cte0 N1, cte0 N2, cte0 N3, cte0 N4)
Select Product
,Value
,ValidFrom = Min(ValidFrom)
,ValidTo = nullif(max(isnull(ValidTo,'2099-12-31')),'2099-12-31')
From (
Select *
,Island = R - Row_Number() over (Partition By Product,Value Order by ValidFrom)
From #Prices A
Join cte1 B on D Between ValidFrom and IsNull(ValidTo,'2099-12-31')
) A
Group By Product,Value,Island
Order By 1 Desc,3
Returns
Product Value ValidFrom ValidTo
Island A 10.10000 2014-01-01 2014-10-10
Island A 10.11000 2014-10-11 2014-12-31
Island A 11.10000 2015-01-01 2015-12-31
Island A 10.10000 2016-01-01 NULL
Gap B 20.10000 2014-01-01 2014-10-10
Gap B 20.10000 2015-01-01 NULL
Upvotes: 1
Reputation: 12243
Here is a different solution that uses a recursive cte, which I think is a little easier to understand compared to Jon's. On this amount of data it is also a lot more efficient, though you will need to test the performance yourself for a larger dataset:
;with rownum
as
(
select row_number() over (order by Product, ValidFrom) as rn
,Product
,Value
,ValidFrom
,ValidTo
from #Prices
)
,cte
as
(
select rn
,Product
,Value
,ValidFrom
,ValidFrom as ValidFrom2
,ValidTo
from rownum
where rn = 1
union all
select r.rn
,r.Product
,r.Value
,r.ValidFrom
,case when c.Product = r.Product
then case when dateadd(d,1,c.ValidTo) = r.ValidFrom
then c.ValidFrom
else r.ValidFrom
end
else r.ValidFrom
end as ValidFrom2
,isnull(r.ValidTo,'29990101') as ValidTo
from rownum r
inner join cte c
on(r.rn = c.rn+1)
)
select Product
,Value
,ValidFrom2 as ValidFrom
,nullif(max(ValidTo),'29990101') as ValidTo
from cte
group by Product
,Value
,ValidFrom2
order by Product
,ValidFrom2;
Upvotes: 2