Reputation: 2200
Similar to this question but the advice given doesn't seem to work in my case
I have a #temp table that looks something like this:
CREATE TABLE #C (
OutA int,
OutB int,
OutC int,
SortD float,
PartE nvarchar(400),
PartF nvarchar(400)
)
For now it contains approx. 10M rows, although I need it to work with many more rows than this (I've limited it so that checking the query plan doesn't take all day!)
This table has the following indexes and statistics:
CREATE INDEX B ON #C(SortD DESC) INCLUDE (OutA, OutB, OutC)
CREATE INDEX C ON #C(PartE, SortD DESC) INCLUDE (OutA, OutB, OutC)
CREATE INDEX D ON #C(PartF, SortD DESC) INCLUDE (OutA, OutB, OutC)
CREATE STATISTICS E ON #C (OutA);
CREATE STATISTICS F ON #C (OutB);
CREATE STATISTICS G ON #C (OutC);
(I'm not sure why the statistics are needed, it seemed to make no difference to the plan, yet it complained when they were missing)
Finally, I am trying to create 3 different rankings on this data using different partitions but the same sort order
SELECT OutA,
OutB,
OutC,
RANK() OVER (ORDER BY SortD DESC) AS [Rank1],
RANK() OVER (PARTITION BY PartE ORDER BY SortD DESC) AS [Rank2],
RANK() OVER (PARTITION BY PartF ORDER BY SortD DESC) AS [Rank3]
INTO #Junk1
FROM #C
This produces this actual query plan
This takes nearly 2 mins to run.
As you can see, there are multiple expensive sorts in this plan. According to the linked question the indexes that I've created should be useable here, yet they are not used.
If I instead create 3 separate queries, 1 for each rank, in this case the indexes are used as expected. I did try to make use of this and join the 3 query results to produce the same output, but this actually took very slightly longer overall
I also tried modifying the indexes to include the partition columns as follows:
CREATE INDEX B ON #C(SortD DESC) INCLUDE (OutA, OutB, OutC, PartE, PartF)
CREATE INDEX C ON #C(PartE, SortD DESC) INCLUDE (OutA, OutB, OutC, PartF)
CREATE INDEX D ON #C(PartF, SortD DESC) INCLUDE (OutA, OutB, OutC, PartE)
This succeeded in removing the first (rightmost) sort, by changing from a tablescan to an index scan (B). But the other sorts remain
Why are the indexes not used when multiple rankings are required? How can I eliminate the expensive sorts?
Upvotes: 0
Views: 344
Reputation: 2200
Not really an answer to the why, but I did find a workaround.
Instead of creating all 3 indexes and then trying to do all 3 rankings at once. If we instead do it in steps, adding 1 index + 1 ranking at a time, then it does use the indexes.
(N.B. This is not the same as generating the 3 rankings and then joining)
CREATE INDEX B ON #C(SortD DESC) INCLUDE (OutA, OutB, OutC, PartE, PartF)
CREATE STATISTICS E ON #C (OutA); -- not sure if
CREATE STATISTICS F ON #C (OutB); -- these statistics
CREATE STATISTICS G ON #C (OutC); -- are still needed
SELECT OutA,
OutB,
OutC,
PartE,
PartF,
SortD,
RANK() OVER (ORDER BY SortD DESC) AS [Rank1]
INTO #Intermediate1
FROM #C
CREATE INDEX C ON #Intermediate1(PartE, SortD DESC) INCLUDE (OutA, OutB, OutC, PartF, Rank1) -- Need to include Rank1 here for this index to be useable
SELECT OutA,
OutB,
OutC,
Rank1,
PartF,
SortD,
RANK() OVER (PARTITION BY PartE ORDER BY SortD DESC) AS [Rank2]
INTO #Intermediate2
FROM #Intermediate1
-- could drop #Intermediate1 at this point
CREATE INDEX D ON #Intermediate2(PartF, SortD DESC) INCLUDE (OutA, OutB, OutC, Rank1, Rank2) --PartE no longer required
SELECT OutA,
OutB,
OutC,
Rank1,
Rank2,
RANK() OVER (PARTITION BY PartF ORDER BY SortD DESC) AS [Rank3]
FROM #Intermediate2
N.B. doing it this way probably requires more temp space - there may be a tradeoff required.
I think the requirement to include the Rank1 and Rank2 columns into the indexes may point towards why doing all 3 in one go doesn't use the indexes. Examining the Sort we can see [Expr*] included as an output field: - which I think is the output of the previous Ranking.
This feels to me like something that should be fixable within SQL's optimiser?
Upvotes: 0