NS.X.
NS.X.

Reputation: 2142

Group rows into sequences using a sliding window on a DateTime column

I have a table that stores timestamped events. I want to group the events into 'sequences' by using 5-min sliding window on the timestamp column, and write the 'sequence ID' (any ID that can distinguish sequences) and 'order in sequence' into another table.

Input - event table:

+----+-------+-----------+
| Id | Name  | Timestamp |
+----+-------+-----------+
|  1 | test  | 00:00:00  |
|  2 | test  | 00:06:00  |
|  3 | test  | 00:10:00  |
|  4 | test  | 00:14:00  |
+----+-------+-----------+

Desired output - sequence table. Here SeqId is the ID of the starting event, but it doesn't have to be, just something to uniquely identify a sequence.

+---------+-------+----------+
| EventId | SeqId | SeqOrder |
+---------+-------+----------+
|       1 |     1 |        1 |
|       2 |     2 |        1 |
|       3 |     2 |        2 |
|       4 |     2 |        3 |
+---------+-------+----------+

What would be the best way to do it? This is MSSQL 2008, I can use SSAS and SSIS if they make things easier.

Upvotes: 0

Views: 527

Answers (2)

Lmu92
Lmu92

Reputation: 952

CREATE TABLE #Input (Id INT, Name VARCHAR(20), Time_stamp TIME)
INSERT INTO #Input
VALUES
(  1 ,'test','00:00:00'  ),
(  2 ,'test','00:06:00'  ),
(  3 ,'test','00:10:00'  ),
(  4 ,'test','00:14:00'  )

SELECT * FROM #Input;

WITH cte AS -- add a sequential number
(
    SELECT *, 
    ROW_NUMBER() OVER(ORDER BY Id) AS sort
    FROM #Input
), cte2 as -- find the Id's with a difference of more than 5min 
(
    SELECT cte.*,
    CASE WHEN DATEDIFF(MI, cte_1.Time_stamp,cte.Time_stamp) < 5 THEN 0 ELSE 1 END as GrpType
    FROM cte
    LEFT OUTER JOIN 
    cte as cte_1 on cte.sort =cte_1.sort +1
), cte3 as -- assign a SeqId
(
    SELECT GrpType, Time_Stamp,ROW_NUMBER() OVER(ORDER BY Time_stamp) SeqId
    FROM cte2 
    WHERE GrpType = 1

), cte4 as -- find the Time_Stamp range per SeqId
(
    SELECT cte3.*,cte_2.Time_stamp as TS_to
    FROM cte3
    LEFT OUTER JOIN 
    cte3 as cte_2 on cte3.SeqId =cte_2.SeqId -1
)
-- final query
SELECT 
    t.Id, 
    cte4.SeqId, 
    ROW_NUMBER() OVER(PARTITION BY cte4.SeqId ORDER BY t.Time_stamp) AS SeqOrder
FROM cte4 INNER JOIN #Input t ON t.Time_stamp>=cte4.Time_stamp AND (t.Time_stamp <cte4.TS_to OR  cte4.TS_to IS NULL);

This code is slightly more complex but it returns the expected output (which Gordon Linoffs solution doesn't...) and it's even slightly faster.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1271231

You seem to want things grouped together when they are less than five minutes apart. You can assign the groups by getting the previous time stamp and marking the beginning of a group. You then need to do a cumulative sum to get the group id:

with e as (
      select e.*,
             (case when datediff(minute, prev_timestamp, timestamp) < 5 then 1 else 0 end) as flag
      from (select e.*,
                   (select top 1 e2.timestamp
                    from events e2
                    where e2.timestamp < e.timestamp
                     order by e2.timestamp desc
                   ) as prev_timestamp
            from events e
           ) e
     )
select e.eventId, e.seqId,
       row_number() over (partition by seqId order b timestamp) as seqOrder
from (select e.*, (select sum(flag) from e e2 where e2.timestamp <= e.timestamp) as seqId
      from e
     ) e;

By the way, this logic is easier to express in SQL Server 2012+ because the window functions are more powerful.

Upvotes: 1

Related Questions