Reputation: 901
I'm used to get benefits from ROW_NUMBER function in MS SQL Server scripts since 2005 version. But I noticed there is big performance disadvantage querying big tables using this function.
Imagine table with four columns (a real table from external database has more columns, but I used only those to avoid complexity of example):
DECLARE TABLE StockItems (
Id int PRIMARY KEY IDENTITY(1,1),
StockNumber nvarchar(max),
Name nvarchar(max),
[Description] nvarchar(max))
I've written procedure for querying this table filled up by 200 000+ rows with following parameters:
Query:
SELECT sortedItems.Id
,si.StockNumber
,si.Name
,si.Description
FROM (SELECT s.Id
,CASE WHEN @SortDirection=1 THEN
CASE
WHEN CHARINDEX('Name',@SortExpression)=1 THEN
ROW_NUMBER() OVER (ORDER by s.Name DESC)
WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN
ROW_NUMBER() OVER (ORDER by s.StockNumber DESC)
ELSE ROW_NUMBER() OVER (ORDER by s.StockNumber DESC)
END
ELSE
CASE
WHEN CHARINDEX('Name',@SortExpression)=1 THEN
ROW_NUMBER() OVER (ORDER by s.Name ASC)
WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN
ROW_NUMBER() OVER (ORDER by s.StockNumber ASC)
ELSE ROW_NUMBER() OVER (ORDER by s.StockNumber ASC)
END
END AS RowNo
FROM stockItems s
) as sortedItems
INNER JOIN StockItems si ON sortedItems.Id=si.Id
ORDER BY sortedItems.RowNo
In situation when number of rows is growing rapidly, ROW_NUMBER became ineffective, because must sort all rows.
Please can you help me to avoid this performance disadvantage and speed up the query?
Upvotes: 2
Views: 1855
Reputation: 901
I've found solution how to avoid performance penalty using ROW_NUMBER() function over large result sets. A goal I didn't write in my question was to avoid declaring query as nvarchar variable and executing it, because it can cause open door for SQL injection.
So, solution is to query data as much as possible in required sort order, then query result set and switch ordering and get data only for current page. Finally I can take result ordered in opposite order and order them again.
I defined new variable @innerCount to query most inner result set and order it as query client specify in @sortExpression and @sortDirection variables
SET @innerCount = @startRowIndex + @maximumRows
Select OppositeQuery.Id
,s.StockNumber
,s.Name
,s.Description
FROM (SELECT TOP (@maximumRows) InnerItems.Id
FROM
(SELECT TOP (@innerCount) sti.Id
FROM stockItems sti
ORDER BY
CASE WHEN @SortDirection=1 THEN
CASE
WHEN CHARINDEX('Name',@SortExpression)=1 THEN sti.Name
WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN sti.StockNumber
ELSE sti.StockNumber
END
END DESC
CASE WHEN ISNULL(@SortDirection,0)=0 THEN
CASE
WHEN CHARINDEX('Name',@SortExpression)=1 THEN sti.Name
WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN sti.StockNumber
ELSE sti.StockNumber
END
END ASC
) as InnerQuery
INNER JOIN StockItems si on InnerQuery.Id=si.Id
ORDER BY
CASE WHEN @SortDirection=1 then
CASE
WHEN CHARINDEX('Name',@SortExpression)=1 THEN si.Name
WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN si.StockNumber
ELSE si.StockNumber
END
END ASC
CASE WHEN ISNULL(@SortDirection,0)=0 then
CASE
WHEN CHARINDEX('Name',@SortExpression)=1 THEN si.Name
WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN si.StockNumber
ELSE si.StockNumber
END
END ASC
) AS OppositeQuery
INNER JOIN StockItems s on OppositeQuery.Id=s.Id
ORDER BY
CASE WHEN @SortDirection=1 THEN
CASE
WHEN CHARINDEX('Name',@SortExpression)=1 THEN s.Name
WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN s.StockNumber
ELSE s.StockNumber
END
END DESC
CASE WHEN ISNULL(@SortDirection,0)=0 THEN
CASE
WHEN CHARINDEX('Name',@SortExpression)=1 THEN s.Name
WHEN CHARINDEX('StockNumber',@SortExpression)=1 THEN s.StockNumber
ELSE s.StockNumber
END
END ASC
Disadvantage of this approach is that I have to sort data three times, but in case of multiple inner joins to StockItems table subqueries are much faster than using ROW_NUMBER() function.
Thank to all contributors for help.
Upvotes: 0
Reputation: 1269883
You can move the case expression to an order by
clause:
order by (case when @SortDirection=1 and CHARINDEX('Name',@SortExpression)=1 then s.name end) desc,
(case when @SortDirection=1 and CHARINDEX('StockNumber',@SortExpression)=1 then s.StockNumber end) desc,
(case when @SortDirection=1 and (CHARINDEX('StockNumber',@SortExpression)<>1 and CHARINDEX('Name',@SortExpression)<>1) then va.match end) desc,
(case when @SortDirection<>1 and CHARINDEX('Name',@SortExpression)=1 then s.name end) asc,
(case when @SortDirection<>1 and CHARINDEX('StockNumber',@SortExpression)=1 then s.StockNmber end) asc,
(case when @SortDirection<>1 and (CHARINDEX('StockNumber',@SortExpression)<>1 and CHARINDEX('Name',@SortExpression)<>1) then va.match end) asc
I notice the expression has va.match
, so it doesn't really match any tables in your query. So, I'm just putting in the order by
expression.
And, yes, as the table gets bigger, this is going to take more time. I don't know that the order by
will be more efficient than the row_number()
, but it is possible.
If you need to order the rows, then you have to do a sort, one way or another (perhaps you could use an index instead). If you don't care about the order, you could take your chances with:
row_number() over (order by (select NULL))
In SQL Server, I have found that this assigns a sequential number without a separate sort. However, this is not guaranteed (I haven't found any documentation to support this use). And, the result is not necessarily stable from one run to the next.
Upvotes: 0
Reputation: 3043
Check the execution path. ROW_NUMBER()
does not have big impact as long as you have the correct index. The problem with your query isn't in the ROW_NUMBER()
.
Use dynamic instead, it will eliminate the 2 SEGMENTATION caused by the ROW_NUMBER()
.
I tested this on a >4mil records table and it returns in split second:
DECLARE @SortExpression VARCHAR(32) SET @SortExpression = 'StockNumber'
DECLARE @SortDirection BIT SET @SortDirection = 1
DECLARE @startRowIndex BIGINT SET @startRowIndex = 1000
DECLARE @maximumRows BIGINT SET @maximumRows = 5000
DECLARE @vsSQL AS NVARCHAR(MAX)
SET @vsSQL = ''
SET @vsSQL = @vsSQL + 'SELECT sortedItems.Id, sortedItems.StockNumber, sortedItems.Name, sortedItems.Description FROM ( '
SET @vsSQL = @vsSQL + 'SELECT s.Id, s.StockNumber, s.Name, s.Description, '
SET @vsSQL = @vsSQL + 'ROW_NUMBER() OVER (ORDER BY ' + @SortExpression + ' ' + CASE @SortDirection WHEN 1 THEN 'DESC' ELSE 'ASC' END + ') AS RowNo '
SET @vsSQL = @vsSQL + 'FROM StockItems s '
SET @vsSQL = @vsSQL + ') AS sortedItems '
SET @vsSQL = @vsSQL + 'WHERE RowNo BETWEEN ' + CONVERT(VARCHAR,@startRowIndex) + ' AND ' + CONVERT(VARCHAR,@startRowIndex+@maximumRows) + ' '
SET @vsSQL = @vsSQL + 'ORDER BY sortedItems.RowNo'
PRINT @vsSQL
EXEC sp_executesql @vsSQL
Upvotes: 2