Reputation: 4085
This questions is a simplified version of a problem I have at work. I'm using a custom database that supports all standard SQL queries. I suppose this question applies to any RDMS that supports SQL though.
Suppose I have the table events
, including three columns: event_type
(string), details
(string), and timestamp
(integer).
One of the events of interest is power outage. The system logs the power status every once in a while. If there's a power outage, "outage" will be logged. Otherwise, "good" will be logged.
Another event of interest is access. Whenever somebody accesses a room, the room that was entered will be logged as well.
I want to select all rooms entered during a power outages.
For example, I have the following rows.
event_type | details | timestamp
---------------------------------------
power | good | 14
access | room 6 | 13
power | good | 12
access | room 5 | 11
access | room 4 | 10
power | outage | 9
power | outage | 8
access | room 3 | 7
power | outage | 6
access | room 2 | 5
power | good | 4
access | room 1 | 3
power | outage | 2
access | room 0 | 1
power | good | 0
I want to get the following rows.
event_type | details | timestamp
---------------------------------------
power | good | 12
access | room 5 | 11
access | room 4 | 10
power | outage | 9
power | outage | 8
access | room 3 | 7
power | outage | 6
power | good | 4
access | room 1 | 3
power | outage | 2
The current way I'm doing this feels like I'm not using SQL the right way.
First I get the timestamp for a power outage. Next, I get the minimum timestamp that the power came back after that outage. Finally, I select all events between the two timestamps. After that, I find the next outage timestamp after the power came back. Repeat.
Another way I did it was to just download all the rows to a flat file, then use a Python script to filter out the rows I want in a procedural way.
Is there a better way to get the rows I need?
Upvotes: 1
Views: 1324
Reputation: 1269583
You can identify all rows where the previous "power" row was for "outage" and not "good".
I think that gets you the information you want.
select e.*
from events e
where 'outage' = (select e2.details
from events e2
where e2.event_type = 'power' and
e2.timestamp < e.timestamp
order by e2.timestamp desc
limit 1
) or
e.event_type = 'power';
Upvotes: 2
Reputation: 1218
Try the code below. You may need to play with defaults for NULL values returned by the first select statement inside the where for the case when the power goes off and hasn't come back on yet (do you want this to show up or not?). This is most likely less efficient than implementations that utilize more powerful functions (limit, row_number etc).
SELECT event_type, details, timestamp
FROM events t1
WHERE
(event_type = 'power' AND details = 'outage')
OR
(SELECT MAX(timestamp)
FROM events t2
WHERE t2.event_type = 'power' AND
t2.details = 'outage' AND
t2.timestamp < t1.timestamp)
>
(SELECT MAX(timestamp)
FROM events t2
WHERE t2.event_type = 'power' AND
t2.details = 'good' AND
t2.timestamp < t1.timestamp)
Upvotes: 0
Reputation: 50201
Here is a working query for you (SQL Fiddle):
SELECT
E.*
FROM
dbo.Events E
WHERE
E.event_type IN ('access', 'power')
AND (
SELECT TOP 1 details
FROM dbo.Events E2
WHERE
E.timestampid >= E2.timestampid
AND E2.event_type = 'power'
ORDER BY E2.timestampid DESC
) = 'outage'
;
I used SQL Server syntax but it's just as easy to pop a LIMIT 1
in there for MySQL. Regarding windowing functions, I don't have the time or energy to come up with the answer at the moment, but if you'll refer to this question and my answer there you will see one of the techniques I've used for a similar problem.
Also, note that how to handle the data at the beginning of the set is a question. If the first events were { access, room 0 }, { power, outage }
, would you want the room 0
row to be in the result set or not? Even { access, room 0 }, { power, good }
might also be a candidate for inclusion at the start of the set, since it could be the case that the power was out at the start--we don't have that information in this data. If you want to include rows in either of these cases, please let me know and I can update my query.
Upvotes: 2