shA.t
shA.t

Reputation: 16958

Using 'where then Union' or Using 'Union then Where'

Please have these two types of query in your mind:

--query1
Select someFields
From someTables
Where someWhereClues
Union all
Select someFields
FROM some Tables
Where someWhereClues

--query2
Select * FROM (
    Select someFields 
    From someTables
    Union all 
    Select someFields
    FROM someTables
    ) DT
Where someMixedWhereClues

Note :
In both queries final result fields are same

I thought the 1st. query is faster or its performance is better!
But after some researches I confused by this note:

SQL Server (as a sample of RDBMS) first reads whole data then seek records. => so in both queries all records will read and seek.

Please Help me on my misunderstandings, and on if there is any other differences between query1 and query2 ?


Edit: adding sample plans:

select t.Name, t.type from sys.tables t where t.type = 'U'
union all
select t.Name, t.type from sys.objects t where t.type = 'U'

select * from (
    select t.Name, t.type from sys.tables t
    union all
    select t.Name, t.type from sys.objects t
    ) dt
where dt.type = 'U'

Execution Plans are: enter image description here enter image description here

both are same and 50%

Upvotes: 10

Views: 2931

Answers (6)

DrewJordan
DrewJordan

Reputation: 5314

SQLServer will optimize both of those queries down to the same thing, as shown in the execution plans you posted. It's able to do this because in this case the queries are fairly simple; in another case it's possible for it to turn out differently. As long as you're composing a query, you should try to follow the same general rules that the optimizer does, and filter as soon as possible to limit the resultset that returns. By telling it that you first want to only get 'U' records, and then combine those results, you will prepare the query for later revisions which could invalidate the optimizer's choices which led to the same execution plan.

In short, you don't have to force simple queries to be optimal, but it's a good habit to have, and it will help when creating more complex queries.

Upvotes: 1

dean
dean

Reputation: 10098

As a rule of thumb, I will always consider using the first type of the query.

In made-up samples and queries with simple WHERE predicates both will use the same plan. But in a more complex query, with more complicated predicates, the optimizer might not come up with an equally efficient solution for the second type of query (it's just an optimizer, and is bound by resource and time constraints). The more complex the query is, the less chance is the optimizer finds the best execution plan (as it will eventually time-out and choose the least worst plan found so far). And it gets even worse if the predicates are ORed.

Upvotes: 1

Nick N.
Nick N.

Reputation: 13559

The SQL Server query optimizer, optimizes both queries so you get nearly the same performance.

Upvotes: 5

Rick James
Rick James

Reputation: 142238

The first one cannot be slower. Here is the reasoning:

  • If the WHERE clauses in the first can efficiently use an INDEX, there will be fewer rows to collect together in the UNION. Fewer rows --> faster.
  • The second one does not have an INDEX on the UNION, hence the WHERE cannot be optimized in that way.

Here are things that could lead to the first being slower. But I see them as exceptions, not the rule.

  • A different amount of parallelism is achieved.
  • Different stuff happens to be cached at the time you run the queries.

Caveat: I am assuming all three WHERE clauses are identical (as your example shows).

Upvotes: 1

Necreaux
Necreaux

Reputation: 9776

In my experience, there is no straightforward answer to this and it varies based on the nature of the underlying query. As you have shown, the optimizer comes up with the same execution plan in both of those scenarios, however that is not always the case. The performance is usually similar, but sometimes the performance can vary drastically depending on the query. In general I only take a closer look at it when performance is bad for no good reason.

Upvotes: 0

Dmitrij Kultasev
Dmitrij Kultasev

Reputation: 5745

In my practice 1st option was never slower than the 2nd. I think that optimizer is smart enough to optimize these plans more or less in the same manner. However I made some tests and the 1st option was always better. For example:

CREATE TABLE #a ( a INT, b INT );

WITH Numbers ( I ) AS (
    SELECT 1000

    UNION ALL

    SELECT I + 1
    FROM Numbers
    WHERE I < 5000
)
INSERT INTO #a ( a )
SELECT I
FROM Numbers
ORDER BY CRYPT_GEN_RANDOM(4)
OPTION ( MAXRECURSION 0 );

WITH Numbers ( I ) AS (
    SELECT 1000

    UNION ALL

    SELECT I + 1
    FROM Numbers
    WHERE I < 5000
)
INSERT INTO #a ( b )
SELECT I
FROM Numbers
ORDER BY CRYPT_GEN_RANDOM(4)
OPTION ( MAXRECURSION 0 );

SELECT a, b
FROM #a
WHERE a IS NOT NULL
UNION ALL
SELECT a, b
FROM #a
WHERE b IS NOT NULL

SELECT *
FROM (
    SELECT a, b
    FROM #a
    UNION ALL
    SELECT a, b
    FROM #a
    ) c
WHERE a IS NOT NULL
    OR b IS NOT NULL

The result is 47% vs 53%

Upvotes: 0

Related Questions