Reputation: 1176
I have following somewhat complex sql
query which has horrible performance, 'certainly' due to the inner query inside the where
clause. In some cases it takes over a minute. Does anybody know how to rewrite this query giving better performance?
The query:
SELECT DISTINCT t.id as taskId, t.name as taskName,
t.startdate as taskStartDate, t.enddate as taskEndDate,
t.proj_id as taskProjectId
FROM PROJECT p, EMPL_PROJ ep, TASK t, TIMERECORD tr
WHERE
ep.empl_id = ? AND
ep.proj_id = p.id AND
ep.proj_id = t.proj_id AND
((p.startdate IS NULL AND p.enddate IS NULL) OR
(p.startdate IS NULL AND p.enddate >= ?) OR
(p.enddate IS NULL AND p.startdate <= ? + INTERVAL 6 DAY) OR
(p.startdate <= ? + INTERVAL 6 DAY AND p.enddate >= ?) ) AND
((t.startdate IS NULL AND t.enddate IS NULL) OR
(t.startdate IS NULL AND t.enddate >= ?) OR
(t.enddate IS NULL AND t.startdate <= ? + INTERVAL 6 DAY) OR
(t.startdate <= ? + INTERVAL 6 DAY AND t.enddate >= ?)) AND
(
(ep.empl_id = tr.empl_id AND
ep.proj_id = tr.proj_id AND
t.id = tr.task_id AND tr.day <= ? + INTERVAL 7 DAY AND
tr.day >= ? + INTERVAL -14 DAY
) OR
(
(SELECT count(*)
FROM TIMERECORD tr2
WHERE
tr2.empl_id=ep.empl_id AND
tr2.proj_id=p.id AND tr2.day <= ? + INTERVAL 7 DAY AND
tr2.day >= ? + INTERVAL -14 DAY) <= 0
)
)
I'm using mysql server 5.1.40.
Edit (2): With the comments and answers I came to this query which executes under a second (not bad coming from almost a minute!)
SELECT DISTINCT t.id as taskId, t.name as taskName,
t.startdate as taskStartDate, t.enddate as taskEndDate,
t.proj_id as taskProjectId
FROM (PROJECT p INNER JOIN EMPL_PROJ ep ON ep.proj_id = p.id)
INNER JOIN TASK t ON p.id=t.proj_id
INNER JOIN TIMERECORD tr ON tr.empl_id=ep.empl_id AND tr.proj_id=ep.proj_id
AND tr.task_id=t.id
WHERE
ep.empl_id = ? AND
((p.startdate IS NULL AND p.enddate IS NULL) OR
(p.startdate IS NULL AND p.enddate >= ?) OR
(p.enddate IS NULL AND p.startdate <= ? + INTERVAL 6 DAY) OR
(p.startdate <= ? + INTERVAL 6 DAY AND p.enddate >= ?) ) AND
((t.startdate IS NULL AND t.enddate IS NULL) OR
(t.startdate IS NULL AND t.enddate >= ?) OR
(t.enddate IS NULL AND t.startdate <= ? + INTERVAL 6 DAY) OR
(t.startdate <= ? + INTERVAL 6 DAY AND t.enddate >= ?)) AND
(
(
tr.day <= ? + INTERVAL 7 DAY AND
tr.day >= ? + INTERVAL -14 DAY
) OR
(
NOT EXISTS(SELECT *
FROM TIMERECORD tr2 INNER JOIN EMPL_PROJ ON tr2.empl_id=EMPL_PROJ.empl_id
INNER JOIN PROJECT ON PROJECT.id=tr2.proj_id
WHERE
tr2.day BETWEEN ? + INTERVAL -14 DAY AND ? + INTERVAL 7 DAY)
)
)
ORDER BY p.id, t.id
Biggest contribution was the answer suggesting the NOT EXISTS
approach (which I marked as being correct) and the comment not to mix the explicit
and implicit JOIN
's.
Thanks to all!
Upvotes: 1
Views: 336
Reputation: 5694
Get rid of the sub query.
(1). Compute the sub query separately with the columns empl_id,proj_id,count(*) for all empl_ids & proj_ids where the day falls in the required range. This is a simple group by query.
select empl_id,proj_id,count(*) as ct from TIMERECORD
where day between (? + INTERVAL -14 DAY) and (? + INTERVAL 7 DAY)
group by empl_id,proj_id;
Call this result set B
(2). Compute the remaining query as you are doing now. Call this resultset A
(3). Do A left outer join B using columns empl_id,proj_id that are common in A & B
then in the where clause you can check the value in B.ct column, it will be null for all empl_id,proj_id combinations for which no entry was found in TIMERECORD table for the given time range.
Actually you don't even need count(*) since you are not bothered about the actual count. But let me not complicate it more than necessary.
Upvotes: 0
Reputation: 21106
You are using COUNT(*) when you seem to only need a NOT EXISTS...
(
(SELECT count(*)
FROM TIMERECORD tr2
WHERE
tr2.empl_id=ep.empl_id AND
tr2.proj_id=p.id AND tr2.day <= ? + INTERVAL 7 DAY AND
tr2.day >= ? + INTERVAL -14 DAY) <= 0
)
Replace with
(
NOT EXISTS(SELECT *
FROM TIMERECORD tr2
WHERE
tr2.empl_id=ep.empl_id AND
tr2.proj_id=p.id AND tr2.day <= ? + INTERVAL 7 DAY AND
tr2.day >= ? + INTERVAL -14 DAY)
)
Now if a TIMERECORD does exist that part of the where clause will short circuit to FALSE (NOT TRUE) without having to count every TIMERECORD.
Upvotes: 2