Reputation: 141
I have a table which has data arranged in a way that looks like this:
ID | BOUNDARY | TIMESTAMP
1 | NULL | 2016-01-01 00:20:00
2 | A | 2016-01-01 00:20:10
3 | A | 2016-01-01 00:20:14
4 | A | 2016-01-01 00:20:22
5 | NULL | 2016-01-01 00:20:38
6 | A | 2016-01-01 00:20:45
7 | B | 2016-01-01 00:21:02
8 | B | 2016-01-01 00:21:12
9 | A | 2016-01-01 00:21:16
10 | A | 2016-01-01 00:21:22
11 | C | 2016-01-01 00:21:30
12 | A | 2016-01-01 00:21:35
13 | A | 2016-01-01 00:21:40
14 | A | 2016-01-01 00:21:46
15 | A | 2016-01-01 00:21:50
What I would like to do is find an efficient way to flag the ID and timestamps for the start and end of a sequence in SQL Server 2014. A segment would be when a boundary is not null and repeats for at least two consecutive times. For example, the first segment would be from IDs 2-4 the second segment would be IDs 7-8, third would be 9-10.
The approach I tried first was to create two columns, a "startflag" col and an "endflag" column. I create an update query that correctly flags the start and end, but I'd like to create a view where I can see it as one record, like below:
BOUNDARY | START ID | END ID
A | 2 | 4
B | 7 | 8
A | 9 | 10
A | 12 | 15
Upvotes: 2
Views: 1011
Reputation: 26674
The key is to create island groupings by
ID
)Take a look at the example below:
declare @T table (ID int, BOUNDARY char(1), [TIMESTAMP] datetime2)
insert into @T values (1, null, '2016-01-01 00:20:00'), (2, 'A', '2016-01-01 00:20:10'), (3, 'A', '2016-01-01 00:20:14'), (4, 'A', '2016-01-01 00:20:22'), (5, null, '2016-01-01 00:20:38'), (6, 'A', '2016-01-01 00:20:45'), (7, 'B', '2016-01-01 00:21:02'), (8, 'B', '2016-01-01 00:21:12'), (9, 'A', '2016-01-01 00:21:16'), (10, 'A', '2016-01-01 00:21:22'), (11, 'C', '2016-01-01 00:21:30'), (12, 'A', '2016-01-01 00:21:35'), (13, 'A', '2016-01-01 00:21:40'), (14, 'A', '2016-01-01 00:21:46'), (15, 'A', '2016-01-01 00:21:50')
select
BOUNDARY,
min(ID) as [START ID],
max(id) as [END ID]
from
(
select
ID,
BOUNDARY,
ID -
row_number() over (partition by BOUNDARY order by TIMESTAMP) as grp
from @T as t
) as T
where BOUNDARY is not null
group by grp, BOUNDARY
having count(*) >= 2
order by min(ID)
Upvotes: 2
Reputation: 70648
Ok, I'm sure there are better ways to do this, but this works:
WITH CTE AS
(
SELECT *,
RN1 = ROW_NUMBER() OVER(ORDER BY [TIMESTAMP]),
RN2 = ROW_NUMBER() OVER(PARTITION BY BOUNDARY ORDER BY [TIMESTAMP])
FROM #YourTable
), CTE2 AS
(
SELECT *,
RN1-RN2 RN3,
COUNT(*) OVER(PARTITION BY RN1-RN2) N
FROM CTE
)
SELECT BOUNDARY,
MIN(ID) [START ID],
MAX(ID) [END ID]
FROM CTE2
WHERE N > 1
AND BOUNDARY IS NOT NULL
GROUP BY BOUNDARY, RN3
ORDER BY [START ID];
If we use this sample table:
CREATE TABLE #YourTable
([ID] int, [BOUNDARY] varchar(4), [TIMESTAMP] datetime)
;
INSERT INTO #YourTable
([ID], [BOUNDARY], [TIMESTAMP])
VALUES
(1, NULL, '2016-01-01 00:20:00'),
(2, 'A', '2016-01-01 00:20:10'),
(3, 'A', '2016-01-01 00:20:14'),
(4, 'A', '2016-01-01 00:20:22'),
(5, NULL, '2016-01-01 00:20:38'),
(6, 'A', '2016-01-01 00:20:45'),
(7, 'B', '2016-01-01 00:21:02'),
(8, 'B', '2016-01-01 00:21:12'),
(9, 'A', '2016-01-01 00:21:16'),
(10, 'A', '2016-01-01 00:21:22'),
(11, 'C', '2016-01-01 00:21:30'),
(12, 'A', '2016-01-01 00:21:35'),
(13, 'A', '2016-01-01 00:21:40'),
(14, 'A', '2016-01-01 00:21:46'),
(15, 'A', '2016-01-01 00:21:50')
;
The results are:
╔══════════╦══════════╦════════╗
║ BOUNDARY ║ START ID ║ END ID ║
╠══════════╬══════════╬════════╣
║ A ║ 2 ║ 4 ║
║ B ║ 7 ║ 8 ║
║ A ║ 9 ║ 10 ║
║ A ║ 12 ║ 15 ║
╚══════════╩══════════╩════════╝
Upvotes: 3