Reputation: 53
The following two queries are performed with SQL SERVER
SELECT TOP(10) [c].[Name] AS [I0], [c].[Surname] AS [I1],(
SELECT MAX([r].[Date])
FROM [Presences].[Regs] AS [r]
WHERE ([c].[Id] = [r].[ColId]) AND [r].[ColId] IS NOT NULL) AS [I7], [c].[Id] AS [I10]
FROM [Presences].[Cols] AS [c]
SELECT TOP(10) [c].[Code] AS [I0], [c].[Description] AS [I1], (
SELECT MAX([r].[Date])
FROM [Presences].[Regs] AS [r]
WHERE ([c].[Id] = [r].[CantId]) AND [r].[CantId] IS NOT NULL) AS [I9], [c].[Id] AS [I10]
FROM [Presences].[Cants] AS [c]
The second query is evaluated in less than 1 second but the first one takes more than 20 seconds.
Infact the generated execution plan for the first one is very different:
Bad execution plan of the first query
if compared with the second one:
Good execution plan of the second query
It is unclear to me why an index seek is not choosen.
Those are the indexes declared on the table [Regs]:
CREATE NONCLUSTERED INDEX [IX_Regs_Date] ON [Presences].[Regs]
(
[Date] ASC
)
CREATE NONCLUSTERED INDEX [IX_Regs_ColId] ON [Presences].[Regs]
(
[ColId] ASC
)
CREATE NONCLUSTERED INDEX [IX_Regs_CantId] ON [Presences].[Regs]
(
[CantId] ASC
)
Table [Cols] has around 600 rows and table [Cants] 21000.
The interesting fact is that the following query (using FORCESEEK) generates the correct execution plan:
SELECT TOP(10) [c].[Name] AS [I0], [c].[Surname] AS [I1],(
SELECT MAX([r].[Date])
FROM [Presences].[Regs] AS [r]
WITH (FORCESEEK)
WHERE ([c].[Id] = [r].[ColId]) AND [r].[ColId] IS NOT NULL) AS [I8]
FROM [Presences].[Cols] AS [c]
ORDER BY [c].[Name], [c].[Surname]
but I can't specify this hint because the queries are generated using an ORM.
If you need further informations I will be glad to provide them.
Upvotes: 1
Views: 110
Reputation: 453898
Your problem query is
SELECT TOP(10) c.Name AS I0,
c.Surname AS I1,
(SELECT MAX(r.Date)
FROM Presences.Regs AS r
WHERE ( c.Id = r.ColId )) AS I7,
c.Id AS I10
FROM Presences.Cols AS c
ORDER BY c.Name, c.Surname
In both the good and the bad plan the top is the same
It scans the Cols
table, sorts by Name, Surname
and then proceeds to calculate the MAX(Date)
in Presences.Regs
where r.ColId
matches the corresponding c.Id
from the outer row.
The ideal index for this would be one on ColId, Date
- so it can seek into the index and just read the last one for that ColId
.
You don't have this index though so it has two choices
IX_Regs_Date
- this is in date order so it can stop the scan as soon as it finds the first row matching c.Id = r.ColId
IX_Regs_ColId
. Get all rows matching the predicate and then aggregate them to find the MAX
.For option 1 it estimates that it will only have to read 283
rows per scan on average before finding the first one matching c.Id = r.ColId
. In reality it reads 1,358,719
per execution (and there are 10 executions so this is a corresponding 13.58 million key lookups). There are 1,517,230
rows in the table so it looks as though for some reason all the ones matching the join are congregated towards later dates in the table.
The easiest way of fixing this would be to give it the ideal index of ColId, Date
- this covers the query so removes the lookup that is present even in the "good" plan and is an obvious best choice so will remove the temptation for SQL Server to select the "bad" plan.
--Suggested replacement for IX_Regs_ColId
CREATE NONCLUSTERED INDEX [IX_Regs_ColId_Date] ON [Presences].[Regs]
(
[ColId] ASC,
[Date] ASC
)
Upvotes: 2