DJL
DJL

Reputation: 2200

Multiple ranks (with different partitions, but the same sort order) not using indexes

Similar to this question but the advice given doesn't seem to work in my case

I have a #temp table that looks something like this:

  CREATE TABLE #C (
    OutA int,
    OutB int,
    OutC int,
    SortD float,
    PartE nvarchar(400),
    PartF nvarchar(400)
  )

For now it contains approx. 10M rows, although I need it to work with many more rows than this (I've limited it so that checking the query plan doesn't take all day!)

This table has the following indexes and statistics:

  CREATE INDEX B ON #C(SortD DESC) INCLUDE (OutA, OutB, OutC)
  CREATE INDEX C ON #C(PartE, SortD DESC) INCLUDE (OutA, OutB, OutC)
  CREATE INDEX D ON #C(PartF, SortD DESC) INCLUDE (OutA, OutB, OutC)

  CREATE STATISTICS E ON #C (OutA);  
  CREATE STATISTICS F ON #C (OutB);  
  CREATE STATISTICS G ON #C (OutC);  

(I'm not sure why the statistics are needed, it seemed to make no difference to the plan, yet it complained when they were missing)

Finally, I am trying to create 3 different rankings on this data using different partitions but the same sort order

  SELECT      OutA,
              OutB,
              OutC,
              RANK() OVER (ORDER BY SortD DESC) AS [Rank1],
              RANK() OVER (PARTITION BY PartE ORDER BY SortD DESC) AS [Rank2],
              RANK() OVER (PARTITION BY PartF ORDER BY SortD DESC) AS [Rank3]
  INTO        #Junk1
  FROM        #C

This produces this actual query plan Actual query plan

This takes nearly 2 mins to run.

As you can see, there are multiple expensive sorts in this plan. According to the linked question the indexes that I've created should be useable here, yet they are not used.

If I instead create 3 separate queries, 1 for each rank, in this case the indexes are used as expected. I did try to make use of this and join the 3 query results to produce the same output, but this actually took very slightly longer overall

I also tried modifying the indexes to include the partition columns as follows:

  CREATE INDEX B ON #C(SortD DESC) INCLUDE (OutA, OutB, OutC, PartE, PartF)
  CREATE INDEX C ON #C(PartE, SortD DESC) INCLUDE (OutA, OutB, OutC, PartF)
  CREATE INDEX D ON #C(PartF, SortD DESC) INCLUDE (OutA, OutB, OutC, PartE)

This succeeded in removing the first (rightmost) sort, by changing from a tablescan to an index scan (B). But the other sorts remain

Why are the indexes not used when multiple rankings are required? How can I eliminate the expensive sorts?

Upvotes: 0

Views: 344

Answers (1)

DJL
DJL

Reputation: 2200

Not really an answer to the why, but I did find a workaround.

Instead of creating all 3 indexes and then trying to do all 3 rankings at once. If we instead do it in steps, adding 1 index + 1 ranking at a time, then it does use the indexes.

(N.B. This is not the same as generating the 3 rankings and then joining)

  CREATE INDEX B ON #C(SortD DESC) INCLUDE (OutA, OutB, OutC, PartE, PartF)

  CREATE STATISTICS E ON #C (OutA);   -- not sure if
  CREATE STATISTICS F ON #C (OutB);   -- these statistics
  CREATE STATISTICS G ON #C (OutC);   -- are still needed

  SELECT  OutA,
          OutB, 
          OutC,
          PartE,
          PartF,
          SortD,
          RANK() OVER (ORDER BY SortD DESC) AS [Rank1]
  INTO    #Intermediate1
  FROM    #C

  CREATE INDEX C ON #Intermediate1(PartE, SortD DESC) INCLUDE (OutA, OutB, OutC, PartF, Rank1) -- Need to include Rank1 here for this index to be useable

  SELECT  OutA,
          OutB,
          OutC,
          Rank1,
          PartF,
          SortD,
          RANK() OVER (PARTITION BY PartE ORDER BY SortD DESC) AS [Rank2]
  INTO    #Intermediate2
  FROM    #Intermediate1

-- could drop #Intermediate1 at this point

  CREATE INDEX D ON #Intermediate2(PartF, SortD DESC) INCLUDE (OutA, OutB, OutC, Rank1, Rank2) --PartE no longer required
  SELECT  OutA,
          OutB,
          OutC,
          Rank1,
          Rank2,
          RANK() OVER (PARTITION BY PartF ORDER BY SortD DESC) AS [Rank3]
  FROM    #Intermediate2

N.B. doing it this way probably requires more temp space - there may be a tradeoff required.

I think the requirement to include the Rank1 and Rank2 columns into the indexes may point towards why doing all 3 in one go doesn't use the indexes. Examining the Sort we can see [Expr*] included as an output field: Expr - which I think is the output of the previous Ranking.

This feels to me like something that should be fixable within SQL's optimiser?

Upvotes: 0

Related Questions