sirokinl
sirokinl

Reputation: 51

T-SQL: Timeseries filling ranges

I have this dataset where I have a time-series with in the YYYYMM format. I have two columns which basically as true/false flags. I would like to add two extra columns based on these true/false flags that retrieves the current range:

        Default  Cure
201301  0        NULL
201302  0        NULL
201303  0        NULL
201304  1        NULL
201305  1        NULL
201306  1        NULL
201307  1        NULL
201308  NULL     0
201309  NULL     0
201310  NULL     1
201311  0        NULL
201312  0        NULL
201401  0        NULL
201402  0        NULL
201403  1        NULL
201404  1        NULL
201405  0        NULL
201406  0        NULL
201407  NULL     1
201408  NULL     0
201409  NULL     0
201410  NULL     0
201411  NULL     0
201412  NULL     0

I this dataset you can see the Default column being set to 1 for the periods 201304, 05, 06, 07 and the Cure column is set to 1 in the period 201310.

This basically means the Default timeseries is valid from period 201304 until period 201310. Ultimately I would like to generate the following set:

        Default  Cure  DefaultPeriod  CurePeriod
201301  0        NULL  NULL           NULL
201302  0        NULL  NULL           NULL
201303  0        NULL  NULL           NULL
201304  1        NULL  201304         201310
201305  1        NULL  201304         201310
201306  1        NULL  201304         201310
201307  1        NULL  201304         201310
201308  NULL     0     201304         201310
201309  NULL     0     201304         201310
201310  NULL     1     201304         201310
201311  0        NULL  NULL           NULL
201312  0        NULL  NULL           NULL
201401  0        NULL  NULL           NULL
201402  0        NULL  NULL           NULL
201403  1        NULL  201403         201407
201404  1        NULL  201403         201407
201405  0        NULL  201403         201407
201406  0        NULL  201403         201407
201407  NULL     1     201403         201407
201408  NULL     0     NULL           NULL
201409  NULL     0     NULL           NULL
201410  NULL     0     NULL           NULL
201411  NULL     0     NULL           NULL
201412  NULL     0     NULL           NULL

Multiple ranges can occur but they cannot overlap. How would I go about achieving this. I have tried to do all sorts of min/max period join on the same table, but I can't seem to find a working solution.

Upvotes: 0

Views: 66

Answers (1)

James Casey
James Casey

Reputation: 2507

This was a real thinker :)

Basically I am dividing up the data on the "Cure" dates (c1), numbering each group(c2), then looking for mins and maxes within each group (c3 C4), then applying some logic to filter out the rows that come before the min.

declare @t table
(
    [Month] varchar(6),
    [Default] bit,
    [Cure] bit
);

insert into @t values('201301',  0,        NULL);
insert into @t values('201302',  0,        NULL);
insert into @t values('201303',  0,        NULL);
insert into @t values('201304',  1,        NULL);
insert into @t values('201305',  1,        NULL);
insert into @t values('201306',  1,        NULL);
insert into @t values('201307',  1,        NULL);
insert into @t values('201308',  NULL,     0);
insert into @t values('201309',  NULL,     0);
insert into @t values('201310',  NULL,     1);
insert into @t values('201311',  0,        NULL);
insert into @t values('201312',  0,        NULL);
insert into @t values('201401',  0,        NULL);
insert into @t values('201402',  0,        NULL);
insert into @t values('201403',  1,        NULL);
insert into @t values('201404',  1,        NULL);
insert into @t values('201405',  0,        NULL);
insert into @t values('201406',  0,        NULL);
insert into @t values('201407',  NULL,     1);
insert into @t values('201408',  NULL,     0);
insert into @t values('201409',  NULL,     0);
insert into @t values('201410',  NULL,     0);
insert into @t values('201411',  NULL,     0);
insert into @t values('201412',  NULL,     0);


with c1 as
(
    select min([Month]) [Month], 1 x from @t
    union all 
    select [Month],1 from @t
    where Cure = 1
),
c2 as 
(
    select t.[Month],[Default],[Cure],
        sum(x) over (order by t.[Month] rows between unbounded preceding and 1 preceding) grp
    from @t t
    left outer join c1 on c1.[Month] = t.[Month]
),
c3 as
(
    select grp, min([Month]) [Month]
    from c2
    where [Default] = 1
    group by grp
),
c4 as 
(
    select grp, max([Month]) [Month]
    from c2
    where [Cure] = 1
    group by grp
)
select c2.[Month], c2.[Default], c2.[Cure], 
    case when c2.[Month] >= c3.[Month] then c3.[Month] else null end as DefaultPeriod,
    case when c2.[Month] >= c3.[Month] then c4.[Month] else null end as CurePeriod
from c2
left outer join c3 on c2.grp = c3.grp
left outer join c4 on c2.grp = c4.grp

Upvotes: 1

Related Questions