cusimar9
cusimar9

Reputation: 5259

SQL Performance using ROW_NUMBER and dynamic order by

I realise this is a common issue and has been discussed on SO previously, but I thought I would raise the question again in the hope that some viable alternative can be found.

Take the following SQL which combines paging with dynamic ordering:

WITH CTE AS (
  SELECT 
      OrderID,
      ROW_NUMBER() OVER (ORDER BY
        CASE WHEN @SortCol='OrderID' THEN OrderID END ASC,
        CASE WHEN @SortCol='CustomerName' THEN Surname END ASC 
      ) AS ROW_ID
  FROM Orders WHERE X
)

SELECT Orders.* FROM CTE
INNER JOIN Orders ON CTE.OrderID = Orders.OrderID
WHERE ROW_ID BETWEEN @RowStart AND @RowStart + @RowCount -1;

As is well known, the ROW_NUMBER() method does not work well on large tables since the Indexes on the table cannot be properly used when using multiple CASE statements in the ORDER BY clause (see link).

The solution we've been using for a number of years is to construct a string which is then executed using sp_executesql. Performance is good when using dynamic SQL like this but the resulting code is awful from a legibility point of view.

I have heard of the ROWCOUNT method but as far as I'm aware it is still susceptible to the same issues when you introduce the dynamic order by element.

So, at the risk of asking the impossible, what other options are there?

EDIT

In order to make some useful progress here I have put together three queries highlighting the various suggested methods:

  1. Current, Dynamic SQL solution (execution time 147ms)

  2. gbn Solution (execution time 1687ms)

  3. Anders Solution (execution time 1604ms)

  4. Muhmud Solution (execution time 46ms)

Upvotes: 15

Views: 12751

Answers (6)

Arun Prasad E S
Arun Prasad E S

Reputation: 10115

Without CTE Generating Row number when dynamic order conditions are present

select TotalCount = COUNT(U.UnitID) OVER() ,

ROW_NUMBER() over(
    order by 
     (CASE @OrderBy WHEN '1' THEN m.Title  END) ASC ,
     (CASE @OrderBy WHEN '2' THEN m.Title  END) DESC,
     (CASE @OrderBy WHEN '3' THEN Stock.Stock  END) DESC,
     (CASE @OrderBy WHEN '4' THEN Stock.Stock   END) DESC

) as RowNumber,

M.Title,U.ColorCode,U.ColorName,U.UnitID, ISNULL(Stock.Stock,0) as Stock

 from tblBuyOnlineMaster M

    inner join BuyOnlineProductUnitIn U on U.BuyOnlineID=M.BuyOnlineID 

    left join 
            ( select IT.BuyOnlineID,IT.UnitID,ISNULL(sum(IT.UnitIn),0)-ISNULL(sum(IT.UnitOut),0) as Stock 
                from [dbo].[BuyOnlineItemTransaction] IT 
                group by IT.BuyOnlineID,IT.UnitID
             ) as Stock

        on U.UnitID=Stock.UnitID


order by 
 (CASE @OrderBy WHEN '1' THEN m.Title  END) ASC ,
 (CASE @OrderBy WHEN '2' THEN m.Title  END) DESC,
 (CASE @OrderBy WHEN '3' THEN Stock.Stock  END) DESC,
 (CASE @OrderBy WHEN '4' THEN Stock.Stock   END) DESC


offset  @offsetCount rows fetch next 6 rows only 

Upvotes: 0

Devart
Devart

Reputation: 121922

(Edited)

DECLARE 
      @OrderColumnName SYSNAME
    , @RowStart INT
    , @RowCount INT
    , @TopCount INT

SELECT 
      @OrderColumnName = 'EmployeeID'
    , @RowStart = 5
    , @RowCount = 50
    , @TopCount = @RowStart + @RowCount – 1

@muhmud's solution -

; WITH data AS 
(
    SELECT 
          wo.WorkOutID
        , RowIDByEmployee = ROW_NUMBER() OVER (ORDER BY wo.EmployeeID)
        , RowIDByDateOut = ROW_NUMBER() OVER (ORDER BY wo.DateOut)  
    FROM dbo.WorkOut wo
), CTE AS 
(         
    SELECT
          wo.WorkOutID
        , RowID = RowIDByEmployee 
    FROM data wo
    WHERE @OrderColumnName = 'EmployeeID'

    UNION ALL

    SELECT
          wo.WorkOutID
        , RowID = RowIDByDateOut
    FROM data wo
    WHERE @OrderColumnName = 'DateOut'
)
SELECT wo.*  
FROM CTE t
JOIN dbo.WorkOut wo ON t.WorkOutID = wo.WorkOutID
WHERE t.RowID BETWEEN @RowStart AND @RowCount + @RowStart - 1
ORDER BY t.RowID
OPTION (RECOMPILE)

Table 'WorkOut'. Scan count 3, logical reads 14254, physical reads 1,
read-ahead reads 14017, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob
physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 1295 ms,  elapsed time = 3048 ms.

Solution without data common table expression -

;WITH CTE AS 
(         
    SELECT
          wo.WorkOutID
        , RowID = ROW_NUMBER() OVER (ORDER BY wo.EmployeeID) 
    FROM dbo.WorkOut wo
    WHERE @OrderColumnName = 'EmployeeID'

    UNION ALL

    SELECT
          wo.WorkOutID
        , RowID = ROW_NUMBER() OVER (ORDER BY wo.DateOut) 
    FROM dbo.WorkOut wo
    WHERE @OrderColumnName = 'DateOut'
)
SELECT wo.*  
FROM CTE t
JOIN dbo.WorkOut wo ON t.WorkOutID = wo.WorkOutID
WHERE t.RowID BETWEEN @RowStart AND @RowCount + @RowStart - 1
ORDER BY t.RowID
OPTION (RECOMPILE)

Table 'WorkOut'. Scan count 3, logical reads 14254, physical reads 1, read-ahead reads 14017, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1296 ms,  elapsed time = 3049 ms.

Solution with TOP -

;WITH CTE AS 
(         
    SELECT TOP (@TopCount)
          wo.WorkOutID
        , RowID = ROW_NUMBER() OVER (ORDER BY wo.EmployeeID) 
    FROM dbo.WorkOut wo
    WHERE @OrderColumnName = 'EmployeeID'

    UNION ALL

    SELECT TOP (@TopCount)
          wo.WorkOutID
        , RowID = ROW_NUMBER() OVER (ORDER BY wo.DateOut) 
    FROM dbo.WorkOut wo
    WHERE @OrderColumnName = 'DateOut'
)
SELECT wo.*  
FROM CTE t
JOIN dbo.WorkOut wo ON t.WorkOutID = wo.WorkOutID
WHERE t.RowID > @RowStart - 1
ORDER BY t.RowID
OPTION (RECOMPILE)

Table 'WorkOut'. Scan count 3, logical reads 14246, physical reads 1, read-ahead reads 14017, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1248 ms,  elapsed time = 2864 ms.

enter image description here

Upvotes: 2

muhmud
muhmud

Reputation: 4604

How about this:

WITH data as (
    SELECT OrderID,
            ROW_NUMBER() OVER ( ORDER BY OrderID asc) as OrderID_ROW_ID,
            ROW_NUMBER() OVER ( ORDER BY Surname asc) as Surname_ROW_ID
        FROM Orders --WHERE X   
), CTE AS (               
        SELECT OrderID, OrderID_ROW_ID as ROW_ID
        FROM data
        where @SortCol = 'OrderID'

        union all

        SELECT OrderID, Surname_ROW_ID
        FROM data
        where @SortCol = 'Surname'
)
SELECT Orders.*, ROW_ID FROM CTE
INNER JOIN Orders ON CTE.OrderID = Orders.OrderID       
WHERE ROW_ID BETWEEN @RowStart AND @RowStart + @RowCount -1
order by ROW_ID
option (recompile);

Edit: Using option (recompile) on the example query in the post makes it go much quicker. However, case could not be used in exactly that way to choose between ascending/descending order.

The reason for this is that a plan is being generated for values of the variables that are inappropriate, and then this plan is cached. Forcing a recompile allows it to use the actual values of the variables.

Upvotes: 7

user806549
user806549

Reputation:

I would try something similar to this:

WITH CTEOrder AS (
  SELECT 
     OrderID,
     ROW_NUMBER() OVER (ORDER BY OrderID ASC) AS ROW_ID
  FROM Orders 
)
, CTECustomerName AS (
  SELECT 
     OrderID,
     ROW_NUMBER() OVER (ORDER BY Surname ASC) AS ROW_ID
  FROM Orders 
)
, CTECombined AS
(
  SELECT 'OrderID' OrderByType, OrderID, Row_ID
    FROM CTEOrder
   WHERE Row_id BETWEEN @RowStart AND @RowStart + @RowCount -1
  UNION
  SELECT 'CustomerName' OrderByType, OrderID, Row_ID
    FROM CTECustomerName
   WHERE row_id BETWEEN @RowStart AND @RowStart + @RowCount -1
)
SELECT Orders.* FROM CTECombined
INNER JOIN Orders ON CTECombined.OrderID = Orders.OrderID       
WHERE ROW_ID BETWEEN @RowStart AND @RowStart + @RowCount -1;  
AND OrderByType = @SortCol

I tried this with one of my own tables, that has app. 4 million records. Obviously, it has different field names, so my apologies if I haven't 'translated' this properly in the answer and the SQL doesn't run for you. However, the idea should be obvious.

With the code in your question, I get app 200000 logical reads and 6068 ms CPU on my table and with the above I get 1422 logical reads and 78 ms CPU.

I didn't flush cache or any other things required for a real benchmark, but I did try it with different pages, pagesizes, etc. and my results were consistent from the start.

If you have queries with many different fields that you would want to order by, this solution may not scale sufficiently since you'll have to expand the number of CTEs, but you could do that in code if you are constructing SQL anyway - and for the example with sorting on two different fields it works like a charm for me.

EDIT: Come to think of it, you probably won't need individual CTEs for each OrderBy-column, you'll probably be able to just have one where you do both ROW_NUMBER() and UNION in the same CTE. The principle is the same, and I would think the optimizer would end up doing the same, but I haven't benchmarked that yet. I'll update the answer if and when I get the time to verify that.

EDIT 2: As expected, you can do the UNION within one CTE. I'm not going to update the code, but I'll provide some benchmarks on the code as is. I did a pagesize of 10000 rows to see if that made a big difference, but it didn't. (The two runs of cusinar9's and my code respectively were run equivalently, so the cold start had identical parameters for both versions of the code, and the second run had different parameters, but the same for the two version of the code):

cusimar9's code, cold start:

Table 'TestTable'. Scan count 10009, logical reads 43080, physical reads 189, read-ahead reads 12915, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 3037 ms,  elapsed time = 2206 ms.

cusimar9's code, 2nd run, different parameters:

Table 'TestTable'. Scan count 10009, logical reads 43096, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 4132 ms,  elapsed time = 1012 ms.

My suggestion, cold start:

Table 'TestTable'. Scan count 10001, logical reads 31963, physical reads 12, read-ahead reads 6984, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 218 ms,  elapsed time = 1410 ms.

My suggestion, 2nd run:

Table 'TestTable'. Scan count 10001, logical reads 31963, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 218 ms,  elapsed time = 358 ms.

Edit 3: Seeing your posted code, I noted this construction:

PagedCTE AS (
    SELECT (SELECT Max(ROW_ID) FROM OrderByCTE) AS TOTAL_ROWS, OrderID 
    FROM OrderByCTE             
    WHERE
        OrderByCTE.SortCol = @SortCol AND OrderByCTE.SortDir = @SortDir AND
        OrderByCTE.ROW_ID BETWEEN @RowStart AND @RowStart + @RowCount -1
)

I'm not 100% certain about the purpose of this, but I'm guessing you want to return the total of rows so that you can calculate (and display to the user) how far into the recordset we are. So why not just get that number outside of all the ORDER BY noise? To stick to the current style, why not make a CTE with a SELECT COUNT(*)? And then join that in your final select?

Upvotes: 1

Metaphor
Metaphor

Reputation: 6405

This is one case where procedural patterns fail for stored procedures. Your attempt to parametrize the attributes used by the optimizer don't let the optimizer do it's job.

In this case, I would use 2 stored procedures.

If you are really set on using parameters, then:

DECLARE @strSQL varchar(1000) = 
    'SELECT OrderID,ROW_NUMBER() OVER ( ORDER BY '                                                                                        
    + @SortCol +
    ' ASC) AS ROW_ID FROM Orders WHERE X ' +
    ' AND ROW_ID BETWEEN @RowStart and @RowStart + @RowCount - 1;'                                        

EXECUTE (@StrSQL)

Upvotes: 0

gbn
gbn

Reputation: 432271

Try this. This should leverage indexes you have

WITH CTE AS (               
        SELECT 
            Orders.*,
            ROW_NUMBER() OVER (ORDER BY OrderID) AS rnOrderID,
            ROW_NUMBER() OVER (ORDER BY Surname) AS rnSurname                                                                         
        FROM Orders WHERE X                                         
    )
SELECT CTE.*
FROM CTE
WHERE
   CASE @SortCol
       WHEN 'OrderID' THEN rnOrderID
   END BETWEEN @RowStart AND @RowStart + @RowCount -1; 

However, for large datasets (100,000s and more) there are other techniques such as https://web.archive.org/web/20211020131201/https://www.4guysfromrolla.com/webtech/042606-1.shtml

Upvotes: 1

Related Questions