Reputation: 45
In SQL Server, I have a set of time series on/off events which look like this (for simplicity I have only shown one Alarm Number, but there are many in the same table):
'Alarms' Table:
AlarmNumber Time AlarmState
1592 2020-01-02 01:52:02 1
1592 2020-01-02 01:58:07 0
1592 2020-04-28 03:46:49 1
1592 2020-04-28 06:19:10 0
1592 2020-06-04 00:25:22 1
1592 2020-08-27 01:57:03 1
1592 2020-08-27 05:16:32 0
1592 2020-09-17 02:51:57 0
Which I am trying convert into On/Off Pairs:
Output I am trying to achieve, ideally as an SQL View:
AlarmNumber StartTime EndTime
1592 2020-01-02 01:52:02 2020-01-02 01:58:07
1592 2020-04-28 03:46:49 2020-04-28 06:19:10
1592 2020-06-04 00:25:22 NULL
1592 2020-08-27 01:57:03 2020-08-27 05:16:32
1592 NULL 2020-09-17 02:51:57
If I had a clean data set, with no missing 'On' or 'Off' events, I can achieve this with:
select tOn.AlarmNumber, tOn.Time StartTime, tOff.Time EndTime
from (
select AlarmNumber, Time,
ROW_NUMBER() Over(Partition by AlarmNumber order by Time) EventID
from Alarms where AlarmState = 1
) tOn
LEFT JOIN (
select AlarmNumber, Time,
ROW_NUMBER() Over(Partition by AlarmNumber order by Time) EventID
from Alarms where AlarmState = 0
) tOff
on (tOn.AlarmNumber = tOff.AlarmNumber and tOn.EventID = tOff.EventID)
(Code modified from Adriano Carneiro's answer at T-SQL Start and end date times from a single column )
My Question: Can anyone think of an efficient way of processing 'Alarms' table to achieve my sample Output, which handles missing on/off events (shown as the NULL's in the sample Output)?
My backup is to use a Cursor and a Where loop, but I was hoping there may be a way to doing it by grouping pairs of On/Off events together, I just haven't been able to get it working. I have 500k+ of events, so it is a large data set to be iterating over.
Any ideas welcome!
Thank you, Thomas
------ UPDATE 1st Nov 2020 ------
Two great solutions have been provided and both work correctly and provide identical results on a sample of 80,000 rows of messy real world data.
Upvotes: 4
Views: 448
Reputation: 43656
Once having the rows order, just SELECT
them in parts and unite the results using UNION ALL
:
DECLARE @DataSource TABLE
(
[AlarmNumber] INT
,[Time] DATETIME2(0)
,[AlarmState] INT
);
INSERT INTO @DataSource ([AlarmNumber], [Time], [AlarmState])
VALUES (1592, '2020-01-02 01:52:02', 1)
,(1592, '2020-01-02 01:58:07', 0)
,(1592, '2020-04-28 03:46:49', 1)
,(1592, '2020-04-28 06:19:10', 0)
,(1592, '2020-06-04 00:25:22', 1)
,(1592, '2020-08-27 01:57:03', 1)
,(1592, '2020-08-27 05:16:32', 0)
,(1592, '2020-09-17 02:51:57', 0);
-- Add a rowID column to the data
WITH DataSource AS
(
SELECT * ,ROW_NUMBER() Over(Partition by AlarmNumber order by [Time]) rowID
FROM @DataSource
)
-- This is just here so we can sort the result at the end
SELECT * FROM (
-- Select rows of DataSource where there is an ON and subsequent OFF event (DS1 Alarm is ON and DS2 Alarm is OFF)
-- This also catches where there is an ON, but no subsequent OFF (DS2.Time will be NULL)
SELECT DS1.AlarmNumber
,DS1.Time As StartTime
,DS2.Time As EndTime
FROM DataSource DS1
LEFT JOIN DataSource DS2
ON DS1.[rowID] = DS2.[rowID] - 1
AND DS1.AlarmNumber = DS2.AlarmNumber
AND DS2.[AlarmState] = 0
WHERE DS1.[AlarmState] = 1
UNION ALL
-- Select rows of DataSource where there is an OFF and there is no matching ON (aka it turned OFF without ever turning ON)
SELECT DS2.AlarmNumber
,NULL As StartTime
,DS2.Time As EndTime
FROM DataSource DS2
INNER JOIN DataSource DS1
ON DS2.[rowID] -1 = DS1.[rowID]
AND DS1.[AlarmState] = 0
AND DS2.AlarmNumber = DS1.AlarmNumber
WHERE DS2.[AlarmState] = 0
UNION ALL
-- Select rows of DataSource where the first event for this alarm number is an OFF (it would otherwise be missed by the above)
SELECT DS1.AlarmNumber
,NULL As StartTime
,DS1.Time As EndTime
FROM DataSource DS1
WHERE DS1.[AlarmState] = 0 AND DS1.rowID = 1
) z
ORDER BY COALESCE(StartTime,EndTime), AlarmNumber
Upvotes: 2
Reputation: 222622
A group is made of two consecutive rows, where the first has status 1 and the second has status 0. I would approach this with window functions, like this:
select
alarmnumber,
max(case when alarmstate = 1 then time end) start_time,
max(case when alarmstate = 0 then time end) end_time
from (
select a.*,
sum(case when alarmstate = 0 and lag_alarmstate = 1 then 0 else 1 end)
over(partition by alarmnumber order by time) grp
from (
select a.*,
lag(alarmstate) over(partition by alarmnumber order by time) lag_alarmstate
from alarms a
) a
) a
group by alarmnumber, grp
This uses lag()
to retrieve the "previous" state, and a cumulative sum to define the groups. The last step is conditional aggregation.
alarmnumber | start_time | end_time :---------- | :---------------------- | :---------------------- 1592 | 2020-01-02 01:52:02.000 | 2020-01-02 01:58:07.000 1592 | 2020-04-28 03:46:49.000 | 2020-04-28 06:19:10.000 1592 | 2020-06-04 00:25:22.000 | null 1592 | 2020-08-27 01:57:03.000 | 2020-08-27 05:16:32.000 1592 | null | 2020-09-17 02:51:57.000
Upvotes: 3