Ethan Strider
Ethan Strider

Reputation: 8115

Mysql compare average results with current result in a single query

I am wondering if it's possible to run a single query that tells me the following:

Get the names of any applications that have (currently executing) jobs exceeding the average historical runtime

Table 1: Job History

Table 2: Applications

It seemed like a simple question to answer, but creating a mysql query has proven to be a rather involved (albeit intriguing) challenge...

The query needs to grab the jobs where satus = executing, use the application.id to look for other jobs that have a matching application.id and status = success, average the (end_time - start_time), and then compare the average time with the (current_time - start_time) of the currently executing jobs. Finally, it must use the application ID from any jobs that fail this test to grab the application.name from the application table. Is that even possible to do in a single query?

For the sake of this question, let's assume current_time is passed in as an argument.

I made an attempt at a triple nested query, but I'm getting the following error, and I don't know why. I've spent a few hours trying to get this to work, but I'm already in over my head here:

ERROR 1054 (42S22): Unknown column 'jh.start_time' in 'having clause'

This is my attempt:

SELECT name FROM application
WHERE application.id IN (
    SELECT application_id 
    FROM job_history AS jh
    WHERE application_id IN (
        SELECT application_id 
        FROM job_history
        WHERE status='EXECUTION' )
    AND jh.status='SUCCESS'
    HAVING (avg(jh.end_time - jh.start_time)) < (current_time - jh.start_time)
    );

Edit: As suggested, here is some sample data.

Table 1

+--------+------------+------------+----------+----------------+
| job_id |   status   | start_time | end_time | application_id |
+--------+------------+------------+----------+----------------+
| job1   | successful |        100 |      200 | app1           |
| job2   | failed     |        150 |      350 | app2           |
| job3   | successful |        200 |      400 | app1           |
| job4   | execution  |        500 |        0 | app1           |
| job5   | successful |        600 |      800 | app3           |
+--------+------------+------------+----------+----------------+

Table 2

+------+------------------+
|  id  |       name       |
+------+------------------+
| app1 | Team Green's app |
| app2 | Team Blue's app  |
| app3 | Team Red's app   |
+------+------------------+

I would like to grab job4, use the application_id to find job1 and job3. Then take the average run time of job1 and job3 and compare it against the current runtime of job4. If the current runtime is greater than the average runtime, then I would like to report the application name: Team Green's app.

Upvotes: 1

Views: 102

Answers (1)

sgeddes
sgeddes

Reputation: 62831

If I'm understanding your question correctly, here's one option using a join. From here it's easy to adjust the where criteria to your needs:

select distinct a.name
from applications a 
    join jobhistory jh on a.id = jh.applicationid
    join (
      select applicationid, avg(endtime - starttime) avgtime
      from jobhistory
      where status = 'success'
      group by applicationid
    ) t on a.id = t.applicationid
where jh.status = 'execution' and 
  @current_time - jh.starttime > t.avgtime

Upvotes: 1

Related Questions