Reputation: 8115
I am wondering if it's possible to run a single query that tells me the following:
Get the names of any applications that have (currently executing) jobs exceeding the average historical runtime
Table 1: Job History
Table 2: Applications
It seemed like a simple question to answer, but creating a mysql query has proven to be a rather involved (albeit intriguing) challenge...
The query needs to grab the jobs where satus = executing
, use the application.id
to look for other jobs that have a matching application.id
and status = success
, average the (end_time
- start_time
), and then compare the average time
with the (current_time
- start_time
) of the currently executing jobs. Finally, it must use the application ID from any jobs that fail this test to grab the application.name
from the application table. Is that even possible to do in a single query?
For the sake of this question, let's assume current_time
is passed in as an argument.
I made an attempt at a triple nested query, but I'm getting the following error, and I don't know why. I've spent a few hours trying to get this to work, but I'm already in over my head here:
ERROR 1054 (42S22): Unknown column 'jh.start_time' in 'having clause'
This is my attempt:
SELECT name FROM application
WHERE application.id IN (
SELECT application_id
FROM job_history AS jh
WHERE application_id IN (
SELECT application_id
FROM job_history
WHERE status='EXECUTION' )
AND jh.status='SUCCESS'
HAVING (avg(jh.end_time - jh.start_time)) < (current_time - jh.start_time)
);
Edit: As suggested, here is some sample data.
Table 1
+--------+------------+------------+----------+----------------+
| job_id | status | start_time | end_time | application_id |
+--------+------------+------------+----------+----------------+
| job1 | successful | 100 | 200 | app1 |
| job2 | failed | 150 | 350 | app2 |
| job3 | successful | 200 | 400 | app1 |
| job4 | execution | 500 | 0 | app1 |
| job5 | successful | 600 | 800 | app3 |
+--------+------------+------------+----------+----------------+
Table 2
+------+------------------+
| id | name |
+------+------------------+
| app1 | Team Green's app |
| app2 | Team Blue's app |
| app3 | Team Red's app |
+------+------------------+
I would like to grab job4, use the application_id
to find job1 and job3. Then take the average run time of job1 and job3 and compare it against the current runtime of job4. If the current runtime is greater than the average runtime, then I would like to report the application name: Team Green's app.
Upvotes: 1
Views: 102
Reputation: 62831
If I'm understanding your question correctly, here's one option using a join
. From here it's easy to adjust the where
criteria to your needs:
select distinct a.name
from applications a
join jobhistory jh on a.id = jh.applicationid
join (
select applicationid, avg(endtime - starttime) avgtime
from jobhistory
where status = 'success'
group by applicationid
) t on a.id = t.applicationid
where jh.status = 'execution' and
@current_time - jh.starttime > t.avgtime
Upvotes: 1