peterp
peterp

Reputation: 171

MySQL: how to increase speed of a select query with 2 joins and 1 subquery

In a table 'ttraces' I have many records for different tasks (whose value is held in 'taskid' column and is a foreign key of a column 'id' in a table 'ttasks'). Each task inserts a record to 'ttraces' every 8-10 seconds, so caching data to increase performance is not a good idea. What I need is to select only the newest records for each task from 'ttraces', that means the records with the maximum value of the column 'time'. At the moment, I have over 500000 records in the table. The very simplified structure of these two tables looks as follows:

-----------------------
|       ttasks        |
-----------------------
| id | name | blocked |
-----------------------

---------------------
|      ttraces       |
---------------------
| id | taskid | time |
---------------------

And my query is shown below:

SELECT t.name,tr.time
FROM 
    ttraces tr 
    JOIN 
    ttasks t ON tr.itask = t.id 
    JOIN (
        SELECT taskid, MAX(time) AS max_time
        FROM ttraces 
        GROUP BY itask
    ) x ON tr.taskid = x.taskid AND tr.time = x.max_time
WHERE t.blocked

All columns used in WHERE and JOIN clauses are indexed. As for now the query runs for ~1,5 seconds. It's extremely crucial to increase its speed. Thanks for all suggestions. BTW: the database is running on a hosted, shared server and I can't move it anywhere else for the moment.

[EDIT] EXPLAIN SELECT... results are:

--------------------------------------------------------------------------------------------------------------
id   select_type   table        type     possible_keys   key       key_len   ref          rows     Extra
--------------------------------------------------------------------------------------------------------------
1   PRIMARY        <derived2>   ALL      NULL            NULL      NULL      NULL         74   
1   PRIMARY        t            eq_ref   PRIMARY         PRIMARY   4         x.taskid     1        Using where
1   PRIMARY        tr           ref      taskid,time     time      9         x.max_time   1        Using where
2   DERIVED        ttraces      index    NULL            itask     5         NULL         570853   
--------------------------------------------------------------------------------------------------------------

The engine is InnoDB.

Upvotes: 2

Views: 965

Answers (4)

Neville Kuyt
Neville Kuyt

Reputation: 29629

I may be having a bit of a moment, but is this query not logically the same, and (almost certainly) faster?

SELECT t.id, t.name,max(tr.time)
FROM 
ttraces tr 
JOIN 
ttasks t ON tr.itask = t.id 
where BLOCKED
group by t.id, t.name

Upvotes: 1

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125214

If there are many traces for each task then you can keep a table with only the newest traces. Whenever you insert into ttraces you also upsert into ttraces_newest:

insert into ttraces_newest (id, taskid, time) values
(3, 1, '2012-01-01 08:02:01')
on duplicate key update
`time` = current_timestamp

The primary key to ttraces_newest would be (id, taskid). Querying ttraces_newest would be cheaper. How much cheaper depends on how many traces there are to each task. Now the query is:

SELECT t.name,tr.time
FROM 
    ttraces_newest tr 
    JOIN 
    ttasks t ON tr.itask = t.id 
WHERE t.blocked

Upvotes: 0

Amir
Amir

Reputation: 788

Does this code return correct result? If so how is its speed time?

SELECT t.name, max_time 
FROM ttasks t JOIN ( 
    SELECT taskid, MAX(time) AS max_time 
    FROM ttraces  
    GROUP BY taskid
) x ON t.id = x.taskid

Upvotes: 0

WojtusJ
WojtusJ

Reputation: 1318

Here's my idea... You need one composite index on ttraces having taskid and time columns (in that order). Than, use this query:

SELECT t.name,
       trm.mtime
FROM   ttasks AS t
       JOIN (SELECT taskid,
                    Max(time) AS mtime
             FROM   ttraces
             GROUP  BY taskid) AS trm
         ON t.id = trm.taskid
WHERE  t.blocked 

Upvotes: 0

Related Questions