jb10210
jb10210

Reputation: 1176

Query performance issue with query inside where clause

I have following somewhat complex sql query which has horrible performance, 'certainly' due to the inner query inside the where clause. In some cases it takes over a minute. Does anybody know how to rewrite this query giving better performance?
The query:

SELECT DISTINCT t.id as taskId, t.name as taskName, 
  t.startdate as taskStartDate, t.enddate as taskEndDate, 
  t.proj_id as taskProjectId 
FROM PROJECT p, EMPL_PROJ ep, TASK t, TIMERECORD tr 
WHERE 
  ep.empl_id = ? AND 
  ep.proj_id = p.id AND 
  ep.proj_id = t.proj_id AND 
  ((p.startdate IS NULL AND p.enddate IS NULL) OR 
   (p.startdate IS NULL AND p.enddate >= ?) OR 
   (p.enddate IS NULL AND p.startdate <= ? + INTERVAL 6 DAY) OR 
   (p.startdate <= ? + INTERVAL 6 DAY AND p.enddate >= ?) ) AND 
  ((t.startdate IS NULL AND t.enddate IS NULL) OR 
   (t.startdate IS NULL AND t.enddate >= ?) OR 
   (t.enddate IS NULL AND t.startdate <= ? + INTERVAL 6 DAY) OR 
  (t.startdate <= ? + INTERVAL 6 DAY AND t.enddate >= ?)) AND 
  (
   (ep.empl_id = tr.empl_id AND 
    ep.proj_id = tr.proj_id AND 
    t.id = tr.task_id AND tr.day <= ? + INTERVAL 7 DAY AND 
    tr.day >= ? + INTERVAL -14 DAY
   ) OR 
   (
    (SELECT count(*) 
     FROM TIMERECORD tr2 
     WHERE 
     tr2.empl_id=ep.empl_id AND 
     tr2.proj_id=p.id AND tr2.day <= ? + INTERVAL 7 DAY AND 
     tr2.day >= ? + INTERVAL -14 DAY) <= 0
    )
  ) 

I'm using mysql server 5.1.40.

Edit (2): With the comments and answers I came to this query which executes under a second (not bad coming from almost a minute!)

SELECT DISTINCT t.id as taskId, t.name as taskName, 
  t.startdate as taskStartDate, t.enddate as taskEndDate, 
  t.proj_id as taskProjectId 
FROM (PROJECT p INNER JOIN EMPL_PROJ ep ON  ep.proj_id = p.id)  
  INNER JOIN TASK t ON p.id=t.proj_id 
  INNER JOIN TIMERECORD tr ON tr.empl_id=ep.empl_id AND tr.proj_id=ep.proj_id 
    AND tr.task_id=t.id
WHERE 
  ep.empl_id = ? AND 
  ((p.startdate IS NULL AND p.enddate IS NULL) OR 
   (p.startdate IS NULL AND p.enddate >= ?) OR 
   (p.enddate IS NULL AND p.startdate <= ? + INTERVAL 6 DAY) OR 
   (p.startdate <= ? + INTERVAL 6 DAY AND p.enddate >= ?) ) AND 
  ((t.startdate IS NULL AND t.enddate IS NULL) OR 
   (t.startdate IS NULL AND t.enddate >= ?) OR 
   (t.enddate IS NULL AND t.startdate <= ? + INTERVAL 6 DAY) OR 
   (t.startdate <= ? + INTERVAL 6 DAY AND t.enddate >= ?)) AND 
  (
   (
    tr.day <= ? + INTERVAL 7 DAY AND 
    tr.day >= ? + INTERVAL -14 DAY
   ) OR 
   (
    NOT EXISTS(SELECT *
      FROM TIMERECORD tr2 INNER JOIN EMPL_PROJ ON tr2.empl_id=EMPL_PROJ.empl_id 
        INNER JOIN PROJECT ON PROJECT.id=tr2.proj_id
      WHERE 
       tr2.day BETWEEN ? + INTERVAL -14 DAY AND ? + INTERVAL 7 DAY)
   )
  ) 
  ORDER BY p.id, t.id

Biggest contribution was the answer suggesting the NOT EXISTS approach (which I marked as being correct) and the comment not to mix the explicit and implicit JOIN's.

Thanks to all!

Upvotes: 1

Views: 336

Answers (2)

Dojo
Dojo

Reputation: 5694

Get rid of the sub query.

(1). Compute the sub query separately with the columns empl_id,proj_id,count(*) for all empl_ids & proj_ids where the day falls in the required range. This is a simple group by query.

select empl_id,proj_id,count(*) as ct from TIMERECORD 
where day between (? + INTERVAL -14 DAY) and (? + INTERVAL 7 DAY)
group by empl_id,proj_id;

Call this result set B

(2). Compute the remaining query as you are doing now. Call this resultset A

(3). Do A left outer join B using columns empl_id,proj_id that are common in A & B

then in the where clause you can check the value in B.ct column, it will be null for all empl_id,proj_id combinations for which no entry was found in TIMERECORD table for the given time range.

Actually you don't even need count(*) since you are not bothered about the actual count. But let me not complicate it more than necessary.

Upvotes: 0

Louis Ricci
Louis Ricci

Reputation: 21106

You are using COUNT(*) when you seem to only need a NOT EXISTS...

(
(SELECT count(*) 
 FROM TIMERECORD tr2 
 WHERE 
 tr2.empl_id=ep.empl_id AND 
 tr2.proj_id=p.id AND tr2.day <= ? + INTERVAL 7 DAY AND 
 tr2.day >= ? + INTERVAL -14 DAY) <= 0
)

Replace with

(
NOT EXISTS(SELECT * 
 FROM TIMERECORD tr2 
 WHERE 
 tr2.empl_id=ep.empl_id AND 
 tr2.proj_id=p.id AND tr2.day <= ? + INTERVAL 7 DAY AND 
 tr2.day >= ? + INTERVAL -14 DAY)
)

Now if a TIMERECORD does exist that part of the where clause will short circuit to FALSE (NOT TRUE) without having to count every TIMERECORD.

Upvotes: 2

Related Questions