Need some assistance understanding a SQL Server 2012 query plan

Question

I have the following query:

Select TOP 5000
    CdCl.SubId
From dbo.PanelCdCl CdCl WITH (NOLOCK)
    Inner Join dbo.PanelHistory PH ON PH.SubId = CdCl.SubId
Where CdCl.PanelCdClStatusId IS NULL And PH.LastProcessNumber >= 1605
Order By CdCl.SubId

The query plan looks as follows:

enter image description here

Both the PanelCdCl and PanelHistory tables have a clustered index / primary key on SubId, and it's the only column in the index. There is exactly one row for each SubId in each table. Both tables have ~35M total rows in them.

I'm curious why the query plan is showing a clustered index scan on PanelHistory when the join is being done on the clustered index column.

RBarryYoung · Accepted Answer

It's not scanning PanelHistory's clustered index(SubId) to find a SubId, it's scanning on it to find all rows where LastProcessNumber >= 1605. This is the first logical step.

Then it likewise scans PanelCdCl to find all non-null PanelCdClStatusId rows. Then since they had the same index (SubId), they are both already sorted on the Join column, so it can do a Merge-Join without an additional sort. (Merge-Join is almost always the most efficient if it doesn't have to re-sort the input rows).

Then it doesn't have to do a Sort for the ORDER BY, because it's already in SubId order.

And finally, it does the TOP, which has to be after everything else (by the rules of SQL clause logical execution ordering).

So the only place it tests SubId values is in the Merge-Join, it never pushes it down to the scans. This would probably remain true if it did a Hash-Join instead. Only for a Nested-Loop Join would it have to push the SubId test down as a seek on a table, and that should only be the lower branch, not the upper one.

Need some assistance understanding a SQL Server 2012 query plan

Answers (2)

Related Questions