ChiefTwoPencils
ChiefTwoPencils

Reputation: 13940

What kind of query optimization can be expected?

tl;dr; Will the query optimizer be able to optimize the max calls?

select id
    , max(case when this and that11 and that12 then 1 else 0 end) name1
    , max(case when this and that21 and that22 then 1 else 0 end) name2
    -- ... there's an average of 17 max() over 6 similar selects
    , max(case when this and thatN1 and thatN2 then 1 else 0 end) nameN
from firstTable
    join secondTable ...
    join thirdTable  ...
    join fourthTable ...
where (1=1)

We have a partnership with a data source such that when we want to access their data, we tell them what we want and they provide the query; we get copies but we don't have access to change it on the database nor do we have the ability to run a refactored version in a test environment. In one case, we were given a stored proc that included 6 instances similar to the above example; they only differ by the number of columns calculated with max and the where condition.

I spend most of my time at a higher level in ORMs but this looked suspect to me. It's maximizing what amounts to a Boolean. max cannot guarantee which of 1 or 1 will be returned; but even if it could, there's no more identifiable data so it's being used like an Any() check. max is an aggregate so it needs to traverse the whole set to find that max but ideally, it would break as soon as it found the first match.

We are having timeout issues and even though the owners of that system increased the timeout to an hour, it still fails when trying to get all the data we need. The query is generally pretty gnarly but this usage of max stuck out as a likely culprit so I wanted to get an idea if I was on the right track before scrutinizing the rest.

Am I right to believe this is at least inefficient? Are query optimizers (oracle in this case) "smart" enough to help with this if so?

Upvotes: 1

Views: 57

Answers (1)

deroby
deroby

Reputation: 6002

TL;DR: No

As always, without showing us the queryplan, your guess is as good as ours so the below needs to be taken with a (huge) grain of salt:

Simply looking at the provided 'sample query' I see

  • a bunch of JOINs that may or may not be bottlenecking your query

    • Indexes play a huge role here
    • Up-to-date statistics play a huge role here
    • The 'layout' of your data may be influencing your query
      • if two tables join in a 1:1 fashion that's usually fine
      • if two tables join in a 1:100 fashion, and that happens for some others too then things will explode really, really fast.
  • a bunch of this1, this2, ... that1, that2, .. that may or may not be bottlenecking your query

    • keep in mind that whatever fields are being used here, they also need to be fetched from the tables (read: they may not be part of the indexes being used to join the tables so they'll require extra IO or the optimizer might decide to skip the index and simply scan the table instead)
    • if 'this' is something like 'fieldA < fieldB' then (IMHO) the overhead should be minimal
    • if 'this' is something like myFunction(fieldA, fieldB) > x then yeah, this potentially could be eating into your performance
  • WHERE 1=1 implying there could be gazillions of records involved here

    • Processing heaps of data IS going to take time, regardless of your approach

Again, without the actual queryplan this is all just guessing from my part. That said, if I'd be presented with this situation I might consider pre-aggregating all the fields involved first as to avoid duplicate 'this and that' calculations, whether that's feasible depends a lot on the 'this and thats' involved.

Alternatively, converting the query to 17-ish WHERE EXISTS() queries as you suggest might indeed be a valid approach but its performance will depend on the data again:

  • imagine that all 17 return 'false' you may end up running the above query 17 times for it's full length rather than just the once with some post-calculations you do now. Keep in mind however that even in that case there will be more chances that indexes help you out (bigtime!) here, which is 'less likely' for the current construction.
  • also, it's not just 17 WHERE EXISTS(), it's 17 WHERE EXISTS() for each id (!)

All in all, a big YMMV. Given the input presented there is no simple answer I fear -- is there ever with SQL?. Try to get a hold of the queryplan and draw conclusions from that. If there is no way to obtain that, some experimenting might be in order...

Upvotes: 1

Related Questions