Randy Minder
Randy Minder

Reputation: 48402

Need some assistance understanding a SQL Server 2012 query plan

I have the following query:

Select TOP 5000
    CdCl.SubId
From dbo.PanelCdCl CdCl WITH (NOLOCK)
    Inner Join dbo.PanelHistory PH ON PH.SubId = CdCl.SubId
Where CdCl.PanelCdClStatusId IS NULL And PH.LastProcessNumber >= 1605
Order By CdCl.SubId

The query plan looks as follows:

enter image description here

Both the PanelCdCl and PanelHistory tables have a clustered index / primary key on SubId, and it's the only column in the index. There is exactly one row for each SubId in each table. Both tables have ~35M total rows in them.

I'm curious why the query plan is showing a clustered index scan on PanelHistory when the join is being done on the clustered index column.

Upvotes: 2

Views: 68

Answers (2)

Mikael Eriksson
Mikael Eriksson

Reputation: 138960

The merge join operator needs two sorted inputs. The clustered key is SubId in both tables which means that the scan in PanelHistory will give the rows in correct order. The clustered key is included in all non clustered key indexes so because of that you will have all rows in NCI IX_PanelCdCl_PanelCdClStatusId where PanelCdClStatusId is null ordered by SubId as well so that can also be used directly by the merge join.

What you see here is actually two scans, one of the clustered key in PanelHistory with a residual predicate on LastProcessNumber > 1605 and one index range scan in IX_PanelCdCl_PanelCdClStatusId as long as PanelCdClStatusId is null.

They will however not scan the entire table/index. The query is executed from left to right in the query plan where select is asking for one row at a time until there is no more rows to be had. That means that the top operator will stop asking for new rows from the merge join when it has the required 5000 rows.

Upvotes: 1

RBarryYoung
RBarryYoung

Reputation: 56725

It's not scanning PanelHistory's clustered index(SubId) to find a SubId, it's scanning on it to find all rows where LastProcessNumber >= 1605. This is the first logical step.

Then it likewise scans PanelCdCl to find all non-null PanelCdClStatusId rows. Then since they had the same index (SubId), they are both already sorted on the Join column, so it can do a Merge-Join without an additional sort. (Merge-Join is almost always the most efficient if it doesn't have to re-sort the input rows).

Then it doesn't have to do a Sort for the ORDER BY, because it's already in SubId order.

And finally, it does the TOP, which has to be after everything else (by the rules of SQL clause logical execution ordering).

So the only place it tests SubId values is in the Merge-Join, it never pushes it down to the scans. This would probably remain true if it did a Hash-Join instead. Only for a Nested-Loop Join would it have to push the SubId test down as a seek on a table, and that should only be the lower branch, not the upper one.

Upvotes: 3

Related Questions