Reputation: 171
In a table 'ttraces' I have many records for different tasks (whose value is held in 'taskid' column and is a foreign key of a column 'id' in a table 'ttasks'). Each task inserts a record to 'ttraces' every 8-10 seconds, so caching data to increase performance is not a good idea. What I need is to select only the newest records for each task from 'ttraces', that means the records with the maximum value of the column 'time'. At the moment, I have over 500000 records in the table. The very simplified structure of these two tables looks as follows:
-----------------------
| ttasks |
-----------------------
| id | name | blocked |
-----------------------
---------------------
| ttraces |
---------------------
| id | taskid | time |
---------------------
And my query is shown below:
SELECT t.name,tr.time
FROM
ttraces tr
JOIN
ttasks t ON tr.itask = t.id
JOIN (
SELECT taskid, MAX(time) AS max_time
FROM ttraces
GROUP BY itask
) x ON tr.taskid = x.taskid AND tr.time = x.max_time
WHERE t.blocked
All columns used in WHERE and JOIN clauses are indexed. As for now the query runs for ~1,5 seconds. It's extremely crucial to increase its speed. Thanks for all suggestions. BTW: the database is running on a hosted, shared server and I can't move it anywhere else for the moment.
[EDIT] EXPLAIN SELECT... results are:
--------------------------------------------------------------------------------------------------------------
id select_type table type possible_keys key key_len ref rows Extra
--------------------------------------------------------------------------------------------------------------
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 74
1 PRIMARY t eq_ref PRIMARY PRIMARY 4 x.taskid 1 Using where
1 PRIMARY tr ref taskid,time time 9 x.max_time 1 Using where
2 DERIVED ttraces index NULL itask 5 NULL 570853
--------------------------------------------------------------------------------------------------------------
The engine is InnoDB.
Upvotes: 2
Views: 965
Reputation: 29629
I may be having a bit of a moment, but is this query not logically the same, and (almost certainly) faster?
SELECT t.id, t.name,max(tr.time)
FROM
ttraces tr
JOIN
ttasks t ON tr.itask = t.id
where BLOCKED
group by t.id, t.name
Upvotes: 1
Reputation: 125214
If there are many traces for each task then you can keep a table with only the newest traces. Whenever you insert into ttraces you also upsert into ttraces_newest:
insert into ttraces_newest (id, taskid, time) values
(3, 1, '2012-01-01 08:02:01')
on duplicate key update
`time` = current_timestamp
The primary key to ttraces_newest would be (id, taskid). Querying ttraces_newest would be cheaper. How much cheaper depends on how many traces there are to each task. Now the query is:
SELECT t.name,tr.time
FROM
ttraces_newest tr
JOIN
ttasks t ON tr.itask = t.id
WHERE t.blocked
Upvotes: 0
Reputation: 788
Does this code return correct result? If so how is its speed time?
SELECT t.name, max_time
FROM ttasks t JOIN (
SELECT taskid, MAX(time) AS max_time
FROM ttraces
GROUP BY taskid
) x ON t.id = x.taskid
Upvotes: 0
Reputation: 1318
Here's my idea... You need one composite index on ttraces
having taskid
and time
columns (in that order). Than, use this query:
SELECT t.name,
trm.mtime
FROM ttasks AS t
JOIN (SELECT taskid,
Max(time) AS mtime
FROM ttraces
GROUP BY taskid) AS trm
ON t.id = trm.taskid
WHERE t.blocked
Upvotes: 0