Reputation: 8325
We have a table which stores events (alerts based on machine learning). Each alert provides some information about future possible failures that might happen. Alerts that meet some criteria are special and planned maintenance is scheduled to avoid future failure based on information from such alert.
The table schema is similar to:
|id|fin|datetime|failure|
-------------------------
|1 |fin1|2016-04-02|0|
|2 |fin1|2016-05-02|1|
|3 |fin2|2017-05-02|0|
|4 |fin2|2017-05-12|0|
|5 |fin2|2017-05-01|1|
|6 |fin3|2017-12-01|1|
where fin
describes the part which failed or may fail in near future and failure
is binary and indicates whether the event is an alert or an actual failure (we need both in same table).
Now what we want is to get all alerts which are labeled as planned maintenance and also the next failure after each such alert (for same FIN of course as the FIN of the alert).
Note we have the ids
of those alerts that are labeled as planned maintenance from another table based on another evaluation process. You can consider this as given eg in a list of planned_ids
(for example this list may be [1,3]
to match the sample data above)
Right now we solve this problem by making multiple requests in DB (mysql
but not that important), first to get all events that are labeled as planned maintenance (we have the ids
of those events) by using, for example,
select *
from events
where id in [planned_ids]
order by datetime asc -- we dont mind if this is ordered desc as well
Then for each such alert we fetch the next failure for same FIN (functional part) that comes immediately AFTER that event and is a failure. For example by using:
-- using $alert as kind of variable here to denote that the query
-- runs for each alert based on that same alert data i.e datetime and fin part
select *
from events
where fin=$alert.fin and datetime>=$alert.datetime and failure=1
order by datetime desc
limit 0,1
This works but is a kind of N+1
problem.
Can we fetch the alerts and also single next failure for each alert in a single sql query?
Upvotes: 1
Views: 184
Reputation: 1269443
A correlated subquery, perhaps:
select e.*,
(select e2.id
from events e2
where e2.fin = e.fin and e2.datetime > e.datetime
order by e2.datetime desc
limit 1
) as next_event_id
from events e
where e.id in ( . . . )
If you want additional information, you can use this as a subquery and join back to the events
table.
select p.*, nexte.*
from (select e.*,
(select e2.id
from events e2
where e2.fin = e.fin and e2.datetime > e.datetime
order by e2.datetime desc
limit 1
) as next_event_id
from events e
where e.id in ( . . . )
) p left join
events nexte
on nexte.id = p.next_event_id;
According to MySQL documentation on performance of correlated queries it is stated:
For certain cases, a correlated subquery is optimized. For example:
val IN (SELECT key_val FROM tbl_name WHERE correlated_condition)
Otherwise, they are inefficient and likely to be slow. Rewriting the query as a join might improve performance.
Also according to this article:
We have shown that under some circumstances, correlated subqueries can be better than bulk aggregation. In Oracle. With small-medium sized data sets. In other cases, that’s not true as the size of M and N, our two algorithmic complexity variables increase, O(M log N) will be much worse than O(M + N).
Also it seems a kind of N+1
problem still exists (although less severe performance-wise than having N
separate queries) even if using correlated subqueries to construct a single query to fetch our data.
Upvotes: 1