Identifying Sequence Start and Ends in SQL Server

Question

I have a table which has data arranged in a way that looks like this:

ID |  BOUNDARY  | TIMESTAMP
1  |  NULL      | 2016-01-01 00:20:00
2  |  A         | 2016-01-01 00:20:10
3  |  A         | 2016-01-01 00:20:14
4  |  A         | 2016-01-01 00:20:22
5  | NULL       | 2016-01-01 00:20:38
6  |  A         | 2016-01-01 00:20:45
7  |  B         | 2016-01-01 00:21:02
8  |  B         | 2016-01-01 00:21:12
9  |  A         | 2016-01-01 00:21:16
10 |  A         | 2016-01-01 00:21:22
11 |  C         | 2016-01-01 00:21:30
12 |  A         | 2016-01-01 00:21:35
13 |  A         | 2016-01-01 00:21:40
14 |  A         | 2016-01-01 00:21:46
15 |  A         | 2016-01-01 00:21:50

What I would like to do is find an efficient way to flag the ID and timestamps for the start and end of a sequence in SQL Server 2014. A segment would be when a boundary is not null and repeats for at least two consecutive times. For example, the first segment would be from IDs 2-4 the second segment would be IDs 7-8, third would be 9-10.

The approach I tried first was to create two columns, a "startflag" col and an "endflag" column. I create an update query that correctly flags the start and end, but I'd like to create a view where I can see it as one record, like below:

BOUNDARY | START ID | END ID  
A        |   2      |  4
B        |   7      |  8
A        |   9      |  10
A        |   12     |  15

Lamak · Accepted Answer

Ok, I'm sure there are better ways to do this, but this works:

WITH CTE AS
(
    SELECT  *,
            RN1 = ROW_NUMBER() OVER(ORDER BY [TIMESTAMP]),
            RN2 = ROW_NUMBER() OVER(PARTITION BY BOUNDARY ORDER BY [TIMESTAMP])
    FROM #YourTable
), CTE2 AS
(
    SELECT  *, 
            RN1-RN2 RN3,
            COUNT(*) OVER(PARTITION BY RN1-RN2) N
    FROM CTE
)
SELECT  BOUNDARY,
        MIN(ID) [START ID],
        MAX(ID) [END ID]
FROM CTE2
WHERE N > 1
AND BOUNDARY IS NOT NULL
GROUP BY BOUNDARY, RN3
ORDER BY [START ID];

If we use this sample table:

CREATE TABLE #YourTable
    ([ID] int, [BOUNDARY] varchar(4), [TIMESTAMP] datetime)
;

INSERT INTO #YourTable
    ([ID], [BOUNDARY], [TIMESTAMP])
VALUES
    (1, NULL, '2016-01-01 00:20:00'),
    (2, 'A', '2016-01-01 00:20:10'),
    (3, 'A', '2016-01-01 00:20:14'),
    (4, 'A', '2016-01-01 00:20:22'),
    (5, NULL, '2016-01-01 00:20:38'),
    (6, 'A', '2016-01-01 00:20:45'),
    (7, 'B', '2016-01-01 00:21:02'),
    (8, 'B', '2016-01-01 00:21:12'),
    (9, 'A', '2016-01-01 00:21:16'),
    (10, 'A', '2016-01-01 00:21:22'),
    (11, 'C', '2016-01-01 00:21:30'),
    (12, 'A', '2016-01-01 00:21:35'),
    (13, 'A', '2016-01-01 00:21:40'),
    (14, 'A', '2016-01-01 00:21:46'),
    (15, 'A', '2016-01-01 00:21:50')
;

The results are:

╔══════════╦══════════╦════════╗
║ BOUNDARY ║ START ID ║ END ID ║
╠══════════╬══════════╬════════╣
║ A        ║        2 ║      4 ║
║ B        ║        7 ║      8 ║
║ A        ║        9 ║     10 ║
║ A        ║       12 ║     15 ║
╚══════════╩══════════╩════════╝

Identifying Sequence Start and Ends in SQL Server

Answers (2)

Related Questions