All_Safe
All_Safe

Reputation: 1399

Optimize derived table in select

I have sql query:

SELECT tsc.Id
FROM TEST.Services tsc, 
( 
    select * from DICT.Change sp 

) spc
where tsc.serviceId = spc.service_id 
and tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id) 
and tsc.startDate > GREATEST(spc.StartTime, spc.startDate) 
group by tsc.Id; 

This query is very, very slow.

Explain: enter image description here Can this be optimized? How to rewrite this subquery for another?

Upvotes: 0

Views: 35

Answers (1)

spencer7593
spencer7593

Reputation: 108410

What is the point of this query? Why the CROSS JOIN operation? Why do we need to return multiple copies of id column from Services table? And what are we doing with the millions of rows being returned?

Absent a specification, an actual set of requirements for the resultset, we're just guessing at it.

To answer your questions:

Yes, the query could be "optimized" by rewriting it to the resultset that is actually required, and do it much more efficiently than the monstrously hideous SQL in the question.

Some suggestions: ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead.

With no join predicates, it's a "cross" join. Every row matched from one side matched to every row from the right side.) I recommend including the CROSS keyword as an indication to future readers that the absence of an ON clause (or, join predicates in the WHERE clause) is intentional, and not an oversight.

I'd also avoid an inline view, unless there is a specific reason for one.


UPDATE

The query in the question is updated to include some predicates. Based on the updated query, I would write it like this:

SELECT tsc.id
  FROM TEST.Services tsc
  JOIN DICT.Change spc
    ON tsc.serviceid  = spc.service_id
   AND tsc.startdate  > spc.starttime
   AND tsc.startdate  > spc.starttdate
   AND ( tsc.planid   = spc.plan_id
      OR ( tsc.planid IS NOT NULL AND spc.plan_id = -1 )
       )

Ensure that the query is making use of suitable index by looking at the output of EXPLAIN to see the execution plan, in particular, which indexes are being used.


Some notes:

If there are multiple rows from spc that "match" a row from tsc, the query will return duplicate values of tsc.id. (It's not clear why or if we need to return duplicate values. IF we need to count the number of copies of each tsc,id, we could do that in the query, returning distinct values of tsc.id along with a count. If we don't need duplicates, we could return just a distinct list.

GREATEST function will return NULL if any of the arguments are null. If the condition we need is "a > GREATEST(b,c)", we can specify "a > b AND a > c".

Also, this condition:

tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id) 

can be re-written to return an equivalent result (I'm suspicious about the actual specification, and whether this original condition actually satisfies that adequately. Without example data and sample of expected output, we have to rely on the SQL as the specification, so we honor that in the rewrite.)


If we don't need to return duplicate values of tsc.id, assuming id is unique in TEST.Services, we could also write

SELECT tsc.id
  FROM TEST.Services tsc
 WHERE EXISTS 
       ( SELECT 1
           FROM DICT.Change spc
             ON spc.service_id = tsc.serviceid
            AND spc.starttime  < tsc.startdate
            AND spc.starttdate < tsc.startdate
            AND ( ( spc.plan_id = tsc.planid )
               OR ( spc.plan_id = -1 AND tsc.planid IS NOT NULL )
                )
       )

Upvotes: 1

Related Questions