Reputation: 1399
I have sql
query:
SELECT tsc.Id
FROM TEST.Services tsc,
(
select * from DICT.Change sp
) spc
where tsc.serviceId = spc.service_id
and tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
and tsc.startDate > GREATEST(spc.StartTime, spc.startDate)
group by tsc.Id;
This query is very, very slow.
Explain:
Can this be optimized? How to rewrite this subquery for another?
Upvotes: 0
Views: 35
Reputation: 108410
What is the point of this query? Why the CROSS JOIN operation? Why do we need to return multiple copies of id
column from Services
table? And what are we doing with the millions of rows being returned?
Absent a specification, an actual set of requirements for the resultset, we're just guessing at it.
To answer your questions:
Yes, the query could be "optimized" by rewriting it to the resultset that is actually required, and do it much more efficiently than the monstrously hideous SQL in the question.
Some suggestions: ditch the old-school comma syntax for the join operation, and use the JOIN
keyword instead.
With no join predicates, it's a "cross" join. Every row matched from one side matched to every row from the right side.) I recommend including the CROSS
keyword as an indication to future readers that the absence of an ON
clause (or, join predicates in the WHERE
clause) is intentional, and not an oversight.
I'd also avoid an inline view, unless there is a specific reason for one.
UPDATE
The query in the question is updated to include some predicates. Based on the updated query, I would write it like this:
SELECT tsc.id
FROM TEST.Services tsc
JOIN DICT.Change spc
ON tsc.serviceid = spc.service_id
AND tsc.startdate > spc.starttime
AND tsc.startdate > spc.starttdate
AND ( tsc.planid = spc.plan_id
OR ( tsc.planid IS NOT NULL AND spc.plan_id = -1 )
)
Ensure that the query is making use of suitable index by looking at the output of EXPLAIN
to see the execution plan, in particular, which indexes are being used.
Some notes:
If there are multiple rows from spc
that "match" a row from tsc
, the query will return duplicate values of tsc.id
. (It's not clear why or if we need to return duplicate values. IF we need to count the number of copies of each tsc,id
, we could do that in the query, returning distinct values of tsc.id
along with a count. If we don't need duplicates, we could return just a distinct list.
GREATEST
function will return NULL
if any of the arguments are null. If the condition we need is "a > GREATEST(b,c)
", we can specify "a > b AND a > c
".
Also, this condition:
tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
can be re-written to return an equivalent result (I'm suspicious about the actual specification, and whether this original condition actually satisfies that adequately. Without example data and sample of expected output, we have to rely on the SQL as the specification, so we honor that in the rewrite.)
If we don't need to return duplicate values of tsc.id
, assuming id
is unique in TEST.Services
, we could also write
SELECT tsc.id
FROM TEST.Services tsc
WHERE EXISTS
( SELECT 1
FROM DICT.Change spc
ON spc.service_id = tsc.serviceid
AND spc.starttime < tsc.startdate
AND spc.starttdate < tsc.startdate
AND ( ( spc.plan_id = tsc.planid )
OR ( spc.plan_id = -1 AND tsc.planid IS NOT NULL )
)
)
Upvotes: 1