Reputation: 11
I've come across something unexpected in SQL Server 2012.
While trying to implement paging for an API I've inherited, I discovered that ROW_NUMBER() OVER
with a single ORDER BY
column is quite slow on a large data set.
I should preface this by saying that I do not have access to execution plans or index statistics.
I may be able to wrangle these in our non-production environments, but the record counts there are much lower so I'm not sure it would be of much use.
SELECT
a.Erp_PK
FROM
(SELECT
ROW_NUMBER() OVER(ORDER by Erp_RowGUID asc) AS Row#,
Erp_PK
FROM
Erp
JOIN
Emp ON Emp_PK = Erp_EmpFK
WHERE
Emp_CompanyFK = 2611) a
WHERE
Row# BETWEEN 399001 AND 400000
The Erp
table contains over 32,000,000 records and the internal where clause above returns over 440,000.
I don't know why the person who created the API decided to order by a GUID
, but this column does have a non-unique, non-clustered index.
The above query runs in about 30 seconds.
After trying a few things, I discovered that adding Erp_LastModified
(also with a non-unique, non-clustered index) as a secondary sort reduced query time to 1 second.
Query time jumped back up to 30 secs with a single ORDER BY
of Erp_LastModified
. Then, back down to 2 secs with CAST
(Exp_RowGUID as VARCHAR(100)
).
I'm not looking for a solution so much as some idea of what is going on here.
This all makes me wonder about the health of our indexes which, again, I have limited access to.
Thanks.
Upvotes: 1
Views: 1796
Reputation: 95620
You could try using OFFSET
instead, otherwise, if this is no improvement, include the execution plan of your query, along with the DDL of your table(s) with indexes:
SELECT Erp_PK
FROM Erp
JOIN Emp ON Emp_PK = Erp_EmpFK
WHERE Emp_CompanyFK = 2611
ORDER BY Erp_RowGUID ASC
OFFSET 399000 ROWS FETCH NEXT 999 ROWS ONLY;
Unfortunately if we don't know the DDL for the object(s) and don't have access to the execution plan we can't really offer much more. Plus the fact that any fixes would likely require changes you can't do means that any stabs in the dark couldn't be tested (and i would advise against that on a production system).
It's like asking a car engineer to explain to you how to fix your car because it's accelerating slowly, but you can't explain to him anything of what you see and can't make any changes to the car.
If you can, get your database into a sandbox environment you have more control over, and then we can see what's happening.
Upvotes: 2
Reputation: 960
One theory would be that your second column in the order by clause (the one which improves the performance) is being included in a index with the emp_pk column. You can test this by ordering on that column only and see if the query takes 2 seconds (the index is using the primary key to sort) or it will take 30 seconds (a total scan off all the rows is taking place to generate the ordering).
Upvotes: 0
Reputation: 169
Did you try to change query to utilize pagignation over Order By clause?
SELECT Erp_PK
FROM Erp
JOIN Emp ON Emp_PK = Erp_EmpFK
WHERE Emp_CompanyFK = 2611
Order by a.Erp_PK
OFFSET (399000) ROWS FETCH NEXT (1000) ROWS ONLY
OFFSET values can be a parameters of course
Upvotes: 0