Reputation: 48402
I have the following query:
Select TOP 5000
CdCl.SubId
From dbo.PanelCdCl CdCl WITH (NOLOCK)
Inner Join dbo.PanelHistory PH ON PH.SubId = CdCl.SubId
Where CdCl.PanelCdClStatusId IS NULL And PH.LastProcessNumber >= 1605
Order By CdCl.SubId
The query plan looks as follows:
Both the PanelCdCl and PanelHistory tables have a clustered index / primary key on SubId, and it's the only column in the index. There is exactly one row for each SubId in each table. Both tables have ~35M total rows in them.
I'm curious why the query plan is showing a clustered index scan on PanelHistory when the join is being done on the clustered index column.
Upvotes: 2
Views: 68
Reputation: 138960
The merge join operator needs two sorted inputs. The clustered key is SubId
in both tables which means that the scan in PanelHistory
will give the rows in correct order. The clustered key is included in all non clustered key indexes so because of that you will have all rows in NCI IX_PanelCdCl_PanelCdClStatusId
where PanelCdClStatusId is null
ordered by SubId
as well so that can also be used directly by the merge join.
What you see here is actually two scans, one of the clustered key in PanelHistory
with a residual predicate on LastProcessNumber > 1605
and one index range scan in IX_PanelCdCl_PanelCdClStatusId
as long as PanelCdClStatusId is null
.
They will however not scan the entire table/index. The query is executed from left to right in the query plan where select
is asking for one row at a time until there is no more rows to be had. That means that the top operator will stop asking for new rows from the merge join when it has the required 5000 rows.
Upvotes: 1
Reputation: 56725
It's not scanning PanelHistory's clustered index(SubId) to find a SubId, it's scanning on it to find all rows where LastProcessNumber >= 1605
. This is the first logical step.
Then it likewise scans PanelCdCl to find all non-null PanelCdClStatusId
rows. Then since they had the same index (SubId), they are both already sorted on the Join column, so it can do a Merge-Join without an additional sort. (Merge-Join is almost always the most efficient if it doesn't have to re-sort the input rows).
Then it doesn't have to do a Sort for the ORDER BY, because it's already in SubId
order.
And finally, it does the TOP, which has to be after everything else (by the rules of SQL clause logical execution ordering).
So the only place it tests SubId
values is in the Merge-Join, it never pushes it down to the scans. This would probably remain true if it did a Hash-Join instead. Only for a Nested-Loop Join would it have to push the SubId
test down as a seek on a table, and that should only be the lower branch, not the upper one.
Upvotes: 3