Rudolf Dvoracek
Rudolf Dvoracek

Reputation: 901

Custom sorting and pagination over large tables

I'm used to get benefits from ROW_NUMBER function in MS SQL Server scripts since 2005 version. But I noticed there is big performance disadvantage querying big tables using this function.

Imagine table with four columns (a real table from external database has more columns, but I used only those to avoid complexity of example):

DECLARE TABLE StockItems (
  Id int PRIMARY KEY IDENTITY(1,1),
  StockNumber nvarchar(max),
  Name nvarchar(max),
  [Description] nvarchar(max))

I've written procedure for querying this table filled up by 200 000+ rows with following parameters:

Query:

SELECT sortedItems.Id
    ,si.StockNumber
    ,si.Name
    ,si.Description
FROM (SELECT s.Id
         ,CASE WHEN @SortDirection=1 THEN
            CASE
               WHEN CHARINDEX('Name',@SortExpression)=1 THEN
                 ROW_NUMBER() OVER (ORDER by s.Name DESC)
               WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN
                 ROW_NUMBER() OVER (ORDER by s.StockNumber DESC)
            ELSE ROW_NUMBER() OVER (ORDER by s.StockNumber DESC)
            END
          ELSE    
            CASE
               WHEN CHARINDEX('Name',@SortExpression)=1 THEN
                  ROW_NUMBER() OVER (ORDER by s.Name ASC)
               WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN
                  ROW_NUMBER() OVER (ORDER by s.StockNumber ASC)
            ELSE  ROW_NUMBER() OVER (ORDER by s.StockNumber ASC)
            END
          END AS RowNo
       FROM stockItems s
     ) as sortedItems
INNER JOIN StockItems si ON sortedItems.Id=si.Id
ORDER BY sortedItems.RowNo

In situation when number of rows is growing rapidly, ROW_NUMBER became ineffective, because must sort all rows.

Please can you help me to avoid this performance disadvantage and speed up the query?

Upvotes: 2

Views: 1855

Answers (3)

Rudolf Dvoracek
Rudolf Dvoracek

Reputation: 901

I've found solution how to avoid performance penalty using ROW_NUMBER() function over large result sets. A goal I didn't write in my question was to avoid declaring query as nvarchar variable and executing it, because it can cause open door for SQL injection.

So, solution is to query data as much as possible in required sort order, then query result set and switch ordering and get data only for current page. Finally I can take result ordered in opposite order and order them again.

I defined new variable @innerCount to query most inner result set and order it as query client specify in @sortExpression and @sortDirection variables

SET @innerCount = @startRowIndex + @maximumRows

Select OppositeQuery.Id
,s.StockNumber
,s.Name
,s.Description
FROM (SELECT TOP (@maximumRows) InnerItems.Id
       FROM
            (SELECT TOP (@innerCount) sti.Id
               FROM stockItems sti
               ORDER BY
                CASE WHEN @SortDirection=1 THEN
                    CASE
                        WHEN CHARINDEX('Name',@SortExpression)=1 THEN sti.Name
                        WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN sti.StockNumber
                        ELSE sti.StockNumber
                    END
                END DESC
                CASE WHEN ISNULL(@SortDirection,0)=0 THEN
                    CASE
                       WHEN CHARINDEX('Name',@SortExpression)=1 THEN sti.Name
                       WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN sti.StockNumber
                       ELSE sti.StockNumber
                    END
                END ASC
             ) as InnerQuery
          INNER JOIN StockItems si on InnerQuery.Id=si.Id
          ORDER BY
            CASE WHEN @SortDirection=1 then
                CASE
                   WHEN CHARINDEX('Name',@SortExpression)=1 THEN si.Name
                   WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN si.StockNumber
                   ELSE si.StockNumber
               END
            END ASC
            CASE WHEN ISNULL(@SortDirection,0)=0 then
                CASE
                   WHEN CHARINDEX('Name',@SortExpression)=1 THEN si.Name
                   WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN si.StockNumber
                   ELSE si.StockNumber
                END
            END ASC
    ) AS OppositeQuery
INNER JOIN StockItems s on OppositeQuery.Id=s.Id
ORDER BY
CASE WHEN @SortDirection=1 THEN
    CASE
        WHEN CHARINDEX('Name',@SortExpression)=1 THEN s.Name
        WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN s.StockNumber
        ELSE s.StockNumber
     END
END DESC
CASE WHEN ISNULL(@SortDirection,0)=0 THEN
    CASE
        WHEN CHARINDEX('Name',@SortExpression)=1 THEN s.Name
        WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN s.StockNumber
        ELSE s.StockNumber
    END
END ASC

Disadvantage of this approach is that I have to sort data three times, but in case of multiple inner joins to StockItems table subqueries are much faster than using ROW_NUMBER() function.

Thank to all contributors for help.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269883

You can move the case expression to an order by clause:

order by (case when @SortDirection=1 and CHARINDEX('Name',@SortExpression)=1 then s.name end) desc,
         (case when @SortDirection=1 and CHARINDEX('StockNumber',@SortExpression)=1 then s.StockNumber end) desc,
         (case when @SortDirection=1 and (CHARINDEX('StockNumber',@SortExpression)<>1 and CHARINDEX('Name',@SortExpression)<>1) then va.match end) desc,
         (case when @SortDirection<>1 and CHARINDEX('Name',@SortExpression)=1 then s.name end) asc,
         (case when @SortDirection<>1 and CHARINDEX('StockNumber',@SortExpression)=1 then s.StockNmber end) asc,
         (case when @SortDirection<>1 and (CHARINDEX('StockNumber',@SortExpression)<>1 and CHARINDEX('Name',@SortExpression)<>1) then va.match end) asc

I notice the expression has va.match, so it doesn't really match any tables in your query. So, I'm just putting in the order by expression.

And, yes, as the table gets bigger, this is going to take more time. I don't know that the order by will be more efficient than the row_number(), but it is possible.

If you need to order the rows, then you have to do a sort, one way or another (perhaps you could use an index instead). If you don't care about the order, you could take your chances with:

row_number() over (order by (select NULL))

In SQL Server, I have found that this assigns a sequential number without a separate sort. However, this is not guaranteed (I haven't found any documentation to support this use). And, the result is not necessarily stable from one run to the next.

Upvotes: 0

jlee88my
jlee88my

Reputation: 3043

Check the execution path. ROW_NUMBER() does not have big impact as long as you have the correct index. The problem with your query isn't in the ROW_NUMBER(). Use dynamic instead, it will eliminate the 2 SEGMENTATION caused by the ROW_NUMBER(). I tested this on a >4mil records table and it returns in split second:

DECLARE @SortExpression VARCHAR(32)  SET @SortExpression = 'StockNumber'
DECLARE @SortDirection BIT           SET @SortDirection  = 1
DECLARE @startRowIndex BIGINT        SET @startRowIndex  = 1000
DECLARE @maximumRows BIGINT          SET @maximumRows    = 5000

DECLARE @vsSQL AS NVARCHAR(MAX)
SET @vsSQL = ''
SET @vsSQL = @vsSQL + 'SELECT sortedItems.Id, sortedItems.StockNumber, sortedItems.Name, sortedItems.Description FROM ( '
SET @vsSQL = @vsSQL + 'SELECT s.Id, s.StockNumber, s.Name, s.Description, '
SET @vsSQL = @vsSQL + 'ROW_NUMBER() OVER (ORDER BY ' + @SortExpression + ' ' + CASE @SortDirection WHEN 1 THEN 'DESC' ELSE 'ASC' END + ') AS RowNo '
SET @vsSQL = @vsSQL + 'FROM StockItems s '
SET @vsSQL = @vsSQL + ') AS sortedItems '
SET @vsSQL = @vsSQL + 'WHERE RowNo BETWEEN ' + CONVERT(VARCHAR,@startRowIndex) + ' AND ' + CONVERT(VARCHAR,@startRowIndex+@maximumRows) + ' '
SET @vsSQL = @vsSQL + 'ORDER BY sortedItems.RowNo'

PRINT @vsSQL
EXEC sp_executesql @vsSQL

Upvotes: 2

Related Questions