Emarian
Emarian

Reputation: 73

SQL Server - why is scanning done twice for the same table?

Does anyone know why sql server chooses to query the table 'building' twice? Is there any explanation? Can it be done with only one table seek?

This is the code sample:

DECLARE @id1stBuild INT = 1
    ,@number1stBuild INT = 2
    ,@idLastBuild INT = 5
    ,@numberLastBuild INT = 1;
DECLARE @nr TABLE (nr INT);

INSERT @nr
VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);

CREATE TABLE building (
    id INT PRIMARY KEY identity(1, 1)
    ,number INT NOT NULL
    ,idStreet INT NOT NULL
    ,surface INT NOT NULL
    )

INSERT INTO building (number,idStreet,surface)
SELECT bl.b
    ,n.nr
    ,abs(convert(BIGINT, convert(VARBINARY, NEWID()))) % 500
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY n1.nr) b
    FROM @nr n1
    CROSS JOIN @nr n2
    CROSS JOIN @nr n3
    ) bl
CROSS JOIN @nr n

--***** execution plan for the select below
SELECT *
FROM building b
WHERE b.id = @id1stBuild
    AND b.number = @number1stBuild
    OR b.id = @idLastBuild
    AND b.number = @numberLastBuild

DROP TABLE building

The execution plan for this is always the same: Two Clustered Index Seek unified through Merge Join (Concatenation). The rest is less important. Here is the execution plan:

enter image description here

Upvotes: 7

Views: 1109

Answers (3)

Martin Smith
Martin Smith

Reputation: 453668

It's not scanning twice. It is seeking twice.

Your query is semantically the same as the below.

SELECT *
FROM   building b
WHERE  b.id = @id1stBuild
       AND b.number = @number1stBuild
UNION
SELECT *
FROM   building b
WHERE  b.id = @idLastBuild
       AND b.number = @numberLastBuild 

And the execution plan performs two seeks and unions the result.

Upvotes: 6

Steve Ford
Steve Ford

Reputation: 7763

You can try the following, which gives only one seek and a slight performance improvement. As @Martin_Smith says what you have coded is the equivalent of a Union

SELECT *
FROM building b
WHERE b.id IN (@id1stBuild , @idLastBuild) 
    AND 
        (
            (b.id = @id1stBuild AND b.number = @number1stBuild) OR 
            (b.id = @idLastBuild AND b.number = @numberLastBuild)
        )

Upvotes: 2

Remus Rusanu
Remus Rusanu

Reputation: 294407

why is scanning done twice for the same table?

Is not a scan, is a seek, and that makes all the difference.

Implementing OR as a UNION, and then implementing the UNION via a MERGE JOIN. Is called a 'merge union':

Merge union

Now let’s change the query slightly:

select a from T where b = 1 or c = 3

  |--Stream Aggregate(GROUP BY:([T].[a]))
   |--Merge Join(Concatenation)
        |--Index Seek(OBJECT:([T].[Tb]), SEEK:([T].[b]=(1)) ORDERED FORWARD)
        |--Index Seek(OBJECT:([T].[Tc]), SEEK:([T].[c]=(3)) ORDERED FORWARD)

Instead of the concatenation and sort distinct operators, we now have a merge join (concatenation) and a stream aggregate. What happened? The merge join (concatenation) or “merge union” is not really a join at all. It is implemented by the same iterator as the merge join, but it really performs a union all while preserving the order of the input rows. Finally, we use the stream aggregate to eliminate duplicates. (See this post for more about using stream aggregate to eliminate duplicates.) This plan is generally a better choice since the sort distinct uses memory and could spill data to disk if it runs out of memory while the stream aggregate does not use memory.

Upvotes: 4

Related Questions