Radu Gheorghiu
Radu Gheorghiu

Reputation: 20509

Query to identify contiguous ranges

I'm trying to write a query on the below data set to add a new column which has some sort of "period_id_group".

contiguous  new_period  row_nr  new_period_starting_id
0           0           1       0
1           1           2       2
1           0           3       0
1           0           4       0
1           1           5       5
1           0           6       0

What I'm trying to get is:

contiguous  new_period  row_nr  new_period_starting_id    period_id_group
0           0           1       0                         0
1           1           2       2                         2
1           0           3       0                         2
1           0           4       0                         2
1           1           5       5                         5
1           0           6       0                         5

The logic is that for each 0 value in the new_period_starting_id, it has to get the >0 value from the row above.

So, for row_nr = 1 since there is no row before it, period_id_group is 0.

For row_nr = 2 since this is a new perid (marked by new_period = 1), the period_id_group is 2 (the id of this row).

For row_nr = 3 since it's part of a contiguous range (because contiguous = 1), but is not the start of the range, because it's not a new_period (new_period = 0), its period_id_group should inherit the value from the previous row (which is the start of the contiguous range) - in this case period_id_group = 2 also.

I've tried multiple versions but couldn't get a good solution for SQL Server 2008R2, since I can't use LAG().

What I have, so far, is a shameful:

select *
from #temp2 t1
left join (select distinct new_period_starting_id from #temp2) t2 
    on t1.new_period_starting_id >= t2.new_period_starting_id
where 1 = case 
            when contiguous = 0 
                then 1
            when contiguous = 1 and t2.new_period_starting_id > 0
                then 1
            else 1
        end
order by t1.rn

Sample data script:

declare @tmp2 table (contiguous int
                   , new_period int
                   , row_nr int
                   , new_period_starting_id int);

insert into @tmp2 values (0, 0, 1, 0)
                        , (1, 1, 2, 2)
                        , (1, 0, 3, 0)
                        , (1, 0, 4, 0)
                        , (1, 1, 5, 5)
                        , (1, 0, 6, 0);

Any help is appreciated.

Upvotes: 0

Views: 313

Answers (3)

Sean Lange
Sean Lange

Reputation: 33581

Here is yet another option for this.

select t1.contiguous
    , t1.new_period
    , t1.row_nr
    , t1.new_period_starting_id
    , x.new_period_starting_id
from @tmp2 t1
outer apply
(
    select top 1 *
    from @tmp2 t2
    where (t2.row_nr = 1
        or t2.new_period_starting_id > 0)
        and t1.row_nr >= t2.row_nr
    order by t2.row_nr desc
) x

Upvotes: 1

Radu Gheorghiu
Radu Gheorghiu

Reputation: 20509

Found the solution:

select *
    , case
        when contiguous = 0
            then f1
        when contiguous = 1 and new_periods = 1
            then f1
        when contiguous = 1 and new_periods = 0
            then v
        else NULL
    end [period_group]
from (
    select *
        , (select max(f1) from #temp2 where new_period_starting_id > 0 and rn < t1.rn) [v]
    from #temp2 t1
    ) rs
order by rn

Upvotes: 0

John Pasquet
John Pasquet

Reputation: 1842

So, if I'm understanding you correctly, you just need one additional column.

SELECT t1.contiguous, t1.new_period, t1.row_nr, t1.new_period_starting_id,
    (SELECT TOP 1 (new_period_starting_id) 
     FROM YourTable t2
     WHERE t2.row_nr <= t1.row_nr
         AND t2.period_id_group > 0 /* optimization */
     ORDER BY t2.row_nr DESC /* optimization */) AS period_id_group
FROM YourTable t1

Upvotes: 1

Related Questions