Reputation: 13940
tl;dr; Will the query optimizer be able to optimize the max
calls?
select id
, max(case when this and that11 and that12 then 1 else 0 end) name1
, max(case when this and that21 and that22 then 1 else 0 end) name2
-- ... there's an average of 17 max() over 6 similar selects
, max(case when this and thatN1 and thatN2 then 1 else 0 end) nameN
from firstTable
join secondTable ...
join thirdTable ...
join fourthTable ...
where (1=1)
We have a partnership with a data source such that when we want to access their data, we tell them what we want and they provide the query; we get copies but we don't have access to change it on the database nor do we have the ability to run a refactored version in a test environment. In one case, we were given a stored proc that included 6 instances similar to the above example; they only differ by the number of columns calculated with max
and the where
condition.
I spend most of my time at a higher level in ORMs but this looked suspect to me. It's maximizing what amounts to a Boolean. max
cannot guarantee which of 1
or 1
will be returned; but even if it could, there's no more identifiable data so it's being used like an Any()
check. max
is an aggregate so it needs to traverse the whole set to find that max but ideally, it would break as soon as it found the first match.
We are having timeout issues and even though the owners of that system increased the timeout to an hour, it still fails when trying to get all the data we need. The query is generally pretty gnarly but this usage of max
stuck out as a likely culprit so I wanted to get an idea if I was on the right track before scrutinizing the rest.
Am I right to believe this is at least inefficient? Are query optimizers (oracle in this case) "smart" enough to help with this if so?
Upvotes: 1
Views: 57
Reputation: 6002
TL;DR: No
As always, without showing us the queryplan, your guess is as good as ours so the below needs to be taken with a (huge) grain of salt:
Simply looking at the provided 'sample query' I see
a bunch of JOIN
s that may or may not be bottlenecking your query
a bunch of this1, this2, ... that1, that2, .. that may or may not be bottlenecking your query
'fieldA < fieldB'
then (IMHO) the overhead should be minimalmyFunction(fieldA, fieldB) > x
then yeah, this potentially could be eating into your performanceWHERE 1=1
implying there could be gazillions of records involved here
Again, without the actual queryplan this is all just guessing from my part. That said, if I'd be presented with this situation I might consider pre-aggregating all the fields involved first as to avoid duplicate 'this and that' calculations, whether that's feasible depends a lot on the 'this and thats' involved.
Alternatively, converting the query to 17-ish WHERE EXISTS()
queries as you suggest might indeed be a valid approach but its performance will depend on the data again:
WHERE EXISTS()
, it's 17 WHERE EXISTS()
for each id
(!)All in all, a big YMMV. Given the input presented there is no simple answer I fear -- is there ever with SQL?. Try to get a hold of the queryplan and draw conclusions from that. If there is no way to obtain that, some experimenting might be in order...
Upvotes: 1