Reputation: 10701
Let us have the following table in SQL Server 2016
-- generating 1M test table with four attributes
WITH x AS
(
SELECT n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) v(n)
), t1 AS
(
SELECT ones.n + 10 * tens.n + 100 * hundreds.n + 1000 * thousands.n + 10000 * tenthousands.n + 100000 * hundredthousands.n as id
FROM x ones, x tens, x hundreds, x thousands, x tenthousands, x hundredthousands
)
SELECT id,
id % 50 predicate_col,
row_number() over (partition by id % 50 order by id) join_col,
LEFT('Value ' + CAST(CHECKSUM(NEWID()) AS VARCHAR) + ' ' + REPLICATE('*', 1000), 1000) as padding
INTO TestTable
FROM t1
GO
-- setting the `id` as a primary key (therefore, creating a clustered index)
ALTER TABLE TestTable ALTER COLUMN id int not null
GO
ALTER TABLE TestTable ADD CONSTRAINT pk_TestTable_id PRIMARY KEY (id)
-- creating a non-clustered index
CREATE NONCLUSTERED INDEX ix_TestTable_predicate_col_join_col
ON TestTable (predicate_col, join_col)
GO
Ok, and now when I run the following queries having just slightly different predicates (b.predicate_col <= 0 vs. b.predicate_col = 0) I got completely different plans.
-- Q1
select b.id, b.predicate_col, b.join_col, b.padding
from TestTable b
join TestTable a on b.join_col = a.id
where a.predicate_col = 1 and b.predicate_col <= 0
option (maxdop 1)
-- Q2
select b.id, b.predicate_col, b.join_col, b.padding
from TestTable b
join TestTable a on b.join_col = a.id
where a.predicate_col = 1 and b.predicate_col = 0
option (maxdop 1)
If I look on query plans, then it is clear that he chooses to join the key lookup together with non-clustered index seek first and then he does the final join with non-clustered index in the case of Q1 (which is bad). A much better solution is in the case of Q2: he joins the non-clustered indexes first and then he does the final key lookup.
The question is: why is that and can I improve it somehow?
In my intuitive understanding of histograms, it should be easy to estimate the correct result for both variants of predicates (b.predicate_col <= 0 vs. b.predicate_col = 0
), therefore, why different query plans?
EDIT:
Actually, I do not want to change the indexes or physical structure of the table. I would like to understand why he picks up such a bad query plan in the case of Q1. Therefore, my question is precisely like this: Why he picks such a bad query plan in the case of Q1 and can I improve without altering the physical design?
I have checked the result estimations in the query plan and both query plans have exact row number estimations of every operator! I have checked the result memo structure (OPTION (QUERYTRACEON 3604, QUERYTRACEON 8615, QUERYTRACEON 8620)
) and rules applied during the compilation (OPTION (QUERYTRACEON 3604, QUERYTRACEON 8619, QUERYTRACEON 8620)
) and it seems that he finish the query plan search once he hit the first plan. Is this the reason for such behaviour?
Upvotes: 11
Views: 510
Reputation: 1931
This is caused by SQL Server's inability to use Index Columns to the Right of the Inequality search.
This code produces the same issue:
SELECT * FROM TestTable WHERE predicate_col <= 0 and join_col = 1
SELECT * FROM TestTable WHERE predicate_col = 0 and join_col <= 1
Inequality queries such as >= or <= put a limitation on SQL, the Optimiser can't use the rest of the columns in the index, so when you put an inequality on [predicate_col] you're rendering the rest of the index useless, SQL can't make full use of the index and produces an alternate (bad) plan. [join_col] is the last column in the Index so in the second query SQL can still make full use of the index.
The reason SQL opts for the Hash Match is because it can't guarantee the order of the data coming out of table B. The inequality renders [join_col] in the index useless so SQL has to prepare for unsorted data on the join, even though the row count is the same.
The only way to fix your problem (even though you don't like it) is to alter the Index so that Equality columns come before Inequality columns.
Upvotes: 1
Reputation: 5094
Ok answer can be from Statistics and histogram
point of view also.
Answer can be from index structure
arrangement point of view also.
Ok I am trying to answer this from index structure
.
Although you get same result in both query because there is no predicate_col < 0 records
When there is Range predicate
in composite index
,both the index are not utilise. There can also be so many other reason of index not being utilise.
-- Q1
select b.id, b.predicate_col, b.join_col, b.padding
from TestTable b
join TestTable a on b.join_col = a.id
where a.predicate_col = 1 and b.predicate_col <= 0
option (maxdop 1)
If we want plan like in Q2 then we can create another composite index.
-- creating a non-clustered index
CREATE NONCLUSTERED INDEX ix_TestTable_predicate_col_join_col_1
ON TestTable (join_col,predicate_col)
GO
We get query plan exactly like Q2.
Another way is to define CHECK constraint
in predicate_col
Alter table TestTable ADD check (predicate_col>=0)
GO
This also give same query plan as Q2.
Though in real table and data, whether you can create CHECK Constraint
or create another composite index
or not is another discussion.
Upvotes: 0