Bob
Bob

Reputation: 227

Find out the sequence number of a pattern

Consider below table

Table

ActivityId      Flag       Type
----------      -----      -----
1               N                              
2               N 
3               Y           EXT
4               Y
5               Y           
6               N
7               Y           INT
8               Y      
9               N
10              N
11              N
12              Y           EXT
13              N             
14              N
15              N
16              Y           EXT
17              Y           
18              Y           INT
19              Y           
20              Y           EXT
21              Y
22              N
23              N      

First record has always Flag = N and then any sequence of Flag = Y or Flag = N can exist for next records. Every time flag changes from N to Y, the Type field is either EXT or INT. Next Y records (before next N) might have Type = EXT or INT or NULL and this is not important.

I want to calculate Cycle No for this sequence of N/Y. First cycle starts when Flag = N (always first record has flag = N) and cycle ends when flag changes to Y and Type = EXT. Then, next cycle starts when flag changes to N and ends when flag becomes Y and type = EXT. this is repeated until all records are processed. The result for above table is:

Result

ActivityId      Flag       Type      Cycle No
----------      -----      -----     --------
1               N                       1       
2               N                       1
3               Y           EXT         1
4               Y
5               Y           
6               N                       2
7               Y           INT         2
8               Y                       2
9               N                       2
10              N                       2
11              N                       2
12              Y           EXT         2
13              N                       3
14              N                       3 
15              N                       3
16              Y           EXT         3
17              Y           
18              Y           INT
19              Y           
20              Y           EXT
21              Y
22              N                       4
23              N                       4 

I am using SQL Server 2008 R2 (no LAG/LEAD). Can you please help me find the SQL query to calculate Cycle No?

Upvotes: 3

Views: 468

Answers (2)

iamdave
iamdave

Reputation: 12243

If you are happy with recursion, this can be achieved rather simply with a bit of comparison logic to the preceding row when ordered by your ActivityId:

declare @t table(ActivityId int,Flag nvarchar(1),TypeValue nvarchar(3));
insert into @t values(1 ,'N',null),(2 ,'N',null),(3 ,'Y','EXT'),(4 ,'Y',null),(5 ,'Y',null),(6 ,'N',null),(7 ,'Y','INT'),(8 ,'Y',null),(9 ,'N',null),(10,'N',null),(11,'N',null),(12,'Y','EXT'),(13,'N',null),(14,'N',null),(15,'N',null),(16,'Y','EXT'),(17,'Y',null),(18,'Y','INT'),(19,'Y',null),(20,'Y','EXT'),(21,'Y',null),(22,'N',null),(23,'N',null);

with rn as    -- Derived table purely to guarantee incremental row number.  If you can guarantee your ActivityId values are incremental start to finish, this isn't required.
(   select row_number() over (order by ActivityId) as rn
            ,ActivityId
            ,Flag
            ,TypeValue
    from @t
),d as
(   select rn               -- Recursive CTE that compares the current row to the one previous.
            ,ActivityId
            ,Flag
            ,TypeValue
            ,cast(1 as decimal(10,5)) as CycleNo
    from rn
    where rn = 1

    union all

    select rn.rn
            ,rn.ActivityId
            ,rn.Flag
            ,rn.TypeValue
            ,cast(
                case when d.Flag = 'Y' and d.TypeValue = 'EXT' and d.CycleNo >= 1
                        then case when rn.Flag = 'N'
                                    then d.CycleNo + 1
                                    else (d.CycleNo + 1) * 0.0001    -- This part keeps track of the cycle number in fractional values, which can be removed by converting the final result to INT.
                                    end
                        else case when rn.Flag = 'N' and d.CycleNo < 1
                                    then d.CycleNo * 10000
                                    else d.CycleNo
                                    end
                        end
            as decimal(10,5)) as CycleNo
    from rn
        inner join d
            on d.rn = rn.rn - 1
)
select ActivityId
    ,Flag
    ,TypeValue
    ,cast(CycleNo as int) as CycleNo
from d
order by ActivityId;

Output:

+------------+------+-----------+---------+
| ActivityId | Flag | TypeValue | CycleNo |
+------------+------+-----------+---------+
|          1 | N    | NULL      |       1 |
|          2 | N    | NULL      |       1 |
|          3 | Y    | EXT       |       1 |
|          4 | Y    | NULL      |       0 |
|          5 | Y    | NULL      |       0 |
|          6 | N    | NULL      |       2 |
|          7 | Y    | INT       |       2 |
|          8 | Y    | NULL      |       2 |
|          9 | N    | NULL      |       2 |
|         10 | N    | NULL      |       2 |
|         11 | N    | NULL      |       2 |
|         12 | Y    | EXT       |       2 |
|         13 | N    | NULL      |       3 |
|         14 | N    | NULL      |       3 |
|         15 | N    | NULL      |       3 |
|         16 | Y    | EXT       |       3 |
|         17 | Y    | NULL      |       0 |
|         18 | Y    | INT       |       0 |
|         19 | Y    | NULL      |       0 |
|         20 | Y    | EXT       |       0 |
|         21 | Y    | NULL      |       0 |
|         22 | N    | NULL      |       4 |
|         23 | N    | NULL      |       4 |
+------------+------+-----------+---------+

Upvotes: 1

S&#248;ren Kongstad
S&#248;ren Kongstad

Reputation: 1440

I've got a solution, its not pretty, but through stepwise refinement I get to your result:

The solution is in three steps

  1. Isolate the activityid for all the possible start cycle and end cycle rows.
  2. Filter out all the dud start events
  3. Number the cycles, and find the interval of activityid for each cycle

First i select out all start and end cycle events:

with tab as
(select * from (values
(1,'N',''),(2,'N',''),(3,'Y','EXT'),(4,'Y','')
,(5,'Y',''),(6,'N',''),(7,'Y','INT'),(8,'Y','')
,(9,'N',''),(10,'N',''),(11,'N',''),(12,'Y','EXT')
,(13,'N',''),(14,'N',''),(15,'N',''),(16,'Y','EXT')
,(17,'Y',''),(18,'Y','INT'),(19,'Y',''),(20,'Y','EXT')
,(21,'Y',''),(22,'N',''),(23,'N','')) a(ActivityId,Flag,[Type]))

,CTE1 as
( select
    ROW_NUMBER() over (order by t1.ActivityId) rn
    ,t1.ActivityId
    ,case when t1.Flag='N' then 'Start' else 'End' end Cycle
from tab t1
where t1.Flag='N' or (t1.Flag='Y' and t1.[Type]='Ext')
)
select * from cte1

This returns

rn  ActivityId  Cycle
1   1   Start
2   2   Start
3   3   End
4   6   Start
5   9   Start
6   10  Start
7   11  Start
8   12  End
9   13  Start
10  14  Start
11  15  Start
12  16  End
13  20  End
14  22  Start
15  23  Start

The problem is now that while we are sure of when the cycle ends, that is when Flag is N and Type is Ext, we are not sure when the cycle starts. Row 1 and 2 both denote a possible start event. But luckily we can see that only a start event following an End event is to be counted. Since we do not have lag or lead we must join the CTE with itself:

,CTE2 as
(
select ROW_NUMBER() over (order by a1.activityid) rn
       ,a1.ActivityId
       ,a1.Cycle
       ,a2.Cycle PrevCycle
from  CTE1 a1 left join CTE1 a2 on a1.rn=a2.rn+1 
where 

    a2.Cycle is null -- First Cycle
    or
    ( a2.Cycle is not null
    and
    ( 
        (a1.Cycle='End' and a2.Cycle='Start') -- End of cycle
        or 
        (a1.Cycle='Start'  
         and a2.Cycle='End') -- next cycles
    )
    )
    )
select * from cte2

This returns

rn  ActivityId  Cycle   PrevCycle
1   1   Start   NULL
2   3   End Start
3   6   Start   End
4   12  End Start
5   13  Start   End
6   16  End Start
7   22  Start   End

I Select the first start event - since we always start with one, and then keep the END events that follow a start event. Finally we only keep the rest of the start events if the previous event was an End event.

Now we can find the start and end of each cycle, and number them:

,cte3 as
(
select ROW_NUMBER() over (order by b1.ActivityId) CycleNumber
    ,b1.ActivityId StartId,b2.ActivityId EndId
from cte2 b1 left join cte2 b2
on b1.rn=b2.rn-1
where b1.Cycle='Start'
)
select * from cte3

Which gives us what we need:

CycleNumber StartId EndId
1   1   3
2   6   12
3   13  16
4   22  NULL

Now we just need to join this back on our table:

select
a.ActivityId,a.Flag,a.[Type],CycleNumber
from tab a
left join cte3 b on a.ActivityId between b.StartId and isnull(b.EndId,a.ActivityId)

This gives the result you were looking for.

This is just a quick and dirty solution, perhaps with a little TLC you can pretty it up, and reduce the number of steps.

The full solution is here:

with tab as
(select * from (values
(1,'N',''),(2,'N',''),(3,'Y','EXT'),(4,'Y','')
,(5,'Y',''),(6,'N',''),(7,'Y','INT'),(8,'Y','')
,(9,'N',''),(10,'N',''),(11,'N',''),(12,'Y','EXT')
,(13,'N',''),(14,'N',''),(15,'N',''),(16,'Y','EXT')
,(17,'Y',''),(18,'Y','INT'),(19,'Y',''),(20,'Y','EXT')
,(21,'Y',''),(22,'N',''),(23,'N','')) a(ActivityId,Flag,[Type]))

,CTE1 as
( select
    ROW_NUMBER() over (order by t1.ActivityId) rn
    ,t1.ActivityId
    ,case when t1.Flag='N' then 'Start' else 'End' end Cycle
from tab t1
where t1.Flag='N' or (t1.Flag='Y' and t1.[Type]='Ext')
)
,CTE2 as
(
select ROW_NUMBER() over (order by a1.activityid) rn
       ,a1.ActivityId
       ,a1.Cycle
       ,a2.Cycle PrevCycle
from  CTE1 a1 left join CTE1 a2 on a1.rn=a2.rn+1 
where 

    a2.Cycle is null -- First Cycle
    or
    ( a2.Cycle is not null
    and
    ( 
        a1.Cycle='End' -- End of cycle
        or 
        (a1.Cycle='Start'  
         and a2.Cycle='End') -- next cycles
    )
    )
    )

,cte3 as
(
select ROW_NUMBER() over (order by b1.ActivityId) CycleNumber
    ,b1.ActivityId StartId,b2.ActivityId EndId
from cte2 b1 left join cte2 b2
on b1.rn=b2.rn-1
where b1.Cycle='Start'
)

select
a.ActivityId,a.Flag,a.[Type],CycleNumber
from tab a
left join cte3 b on a.ActivityId between b.StartId and isnull(b.EndId,a.ActivityId)

Upvotes: 3

Related Questions