oneplusone
oneplusone

Reputation: 33

DBMS optimizer - best execution plan, no matter the query's formulation

If one writes query in a relational DBMS, won't the optimizer choose the best way to execute it (depending on multiple factors) no matter how one formulates it?

I'm curious about SQL Server and Oracle.

For instance:

SELECT * 
FROM t1, t2
WHERE t1.some_column = t2.some_column

Provided the right indexes (with the right selectivity) exist, we should see index seeks followed maybe by key lookups. What we won't see is a cross product followed by a selection in the execution plan.

Then why does https://technet.microsoft.com/en-us/library/ms189575(v=sql.105).aspx state this?

In Transact-SQL, there is usually no performance difference between a statement that includes a subquery and a semantically equivalent version that does not. However, in some cases where existence must be checked, a join yields better performance.

No matter how you write a query and no matter its query class (SPJ, SPJ + UNION, SPJ + subqueries, etc), won't the optimizer find the best semantically equivalent version?

Upvotes: 3

Views: 253

Answers (3)

Daniel Heilper
Daniel Heilper

Reputation: 1250

For non-trivial queries, it will most likely not give you the most optimized execution plan. One reason is that finding an optimal optimization query rewrite is an np-hard problem. For instance, join ordering for cost minimization is considered np-hard (number of possible generated trees from n nodes is n^(n-2) Cayley's formula), and the cost functions are heuristics (based on attributes such as cardinality, sparsity, storage model, etc...). And join ordering is only a subset of the join optimization work, which itself a subset of the whole query optimization work.

Upvotes: 0

Stefanos Zilellis
Stefanos Zilellis

Reputation: 611

Definitely not. Most times it will be one of the best ways yes, but always the best? No. The optimizer has to deal with any statement, applied to any schema, which contains any data. Two different queries having the exact same logic (always respond the same data result) will probably have different execution plans.

Upvotes: 0

TheGameiswar
TheGameiswar

Reputation: 28900

won't the optimizer choose the best way to execute it (depending on multiple factors) no matter how one formulates Q?

I would like to quote Itzik Ben-Gan words from this book : Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions

There are several reasons for this.

For one, SQL Server’s optimizer is not perfect. I don’t want to sound unappreciative—SQL Server’s optimizer is truly a marvel when you think of what this software component can achieve. But it’s a fact that it doesn’t have all possible optimization rules encoded within it.

Two, the optimizer has to limit the amount of time spent on optimization; otherwise, it could spend a much longer time optimizing a query than the amount of time the optimization shaves off from the run time of the query.

The situation could be as absurd as producing a plan in a matter of several dozen milliseconds without going over all possible plans and getting a run time of only seconds, but producing all possible plans in hopes of shaving off a couple of seconds might take a year or even several. You can see that, for practical reasons, the optimizer needs to limit the time spent on optimization.

Based on factors like the sizes of the tables involved in the query, SQL Server calculates two values: one is a cost consid- ered good enough for the query, and the other is the maximum amount of time to spend on optimization before stopping. If either threshold is reached, optimization stops, and SQL Server uses the best plan found at that point.

In summary there are few statements that are optimized,few not

Upvotes: 2

Related Questions