SebastianQ
SebastianQ

Reputation: 11

ROW_NUMBER() OVER is slow with ORDER BY on single column

I've come across something unexpected in SQL Server 2012.

While trying to implement paging for an API I've inherited, I discovered that ROW_NUMBER() OVER with a single ORDER BY column is quite slow on a large data set.

I should preface this by saying that I do not have access to execution plans or index statistics.

I may be able to wrangle these in our non-production environments, but the record counts there are much lower so I'm not sure it would be of much use.

SELECT 
    a.Erp_PK 
FROM
    (SELECT 
         ROW_NUMBER() OVER(ORDER by Erp_RowGUID asc) AS Row#, 
         Erp_PK 
     FROM 
         Erp 
     JOIN 
         Emp ON Emp_PK = Erp_EmpFK 
     WHERE 
         Emp_CompanyFK = 2611) a 
WHERE 
    Row# BETWEEN 399001 AND 400000

The Erp table contains over 32,000,000 records and the internal where clause above returns over 440,000.

I don't know why the person who created the API decided to order by a GUID, but this column does have a non-unique, non-clustered index.

The above query runs in about 30 seconds.

After trying a few things, I discovered that adding Erp_LastModified (also with a non-unique, non-clustered index) as a secondary sort reduced query time to 1 second.

Query time jumped back up to 30 secs with a single ORDER BY of Erp_LastModified. Then, back down to 2 secs with CAST(Exp_RowGUID as VARCHAR(100)).

I'm not looking for a solution so much as some idea of what is going on here.

This all makes me wonder about the health of our indexes which, again, I have limited access to.

Thanks.

Upvotes: 1

Views: 1796

Answers (3)

Thom A
Thom A

Reputation: 95620

You could try using OFFSET instead, otherwise, if this is no improvement, include the execution plan of your query, along with the DDL of your table(s) with indexes:

SELECT Erp_PK
FROM Erp 
     JOIN Emp ON Emp_PK = Erp_EmpFK 
WHERE Emp_CompanyFK = 2611
ORDER BY Erp_RowGUID ASC
OFFSET 399000 ROWS FETCH NEXT 999 ROWS ONLY;

Unfortunately if we don't know the DDL for the object(s) and don't have access to the execution plan we can't really offer much more. Plus the fact that any fixes would likely require changes you can't do means that any stabs in the dark couldn't be tested (and i would advise against that on a production system).

It's like asking a car engineer to explain to you how to fix your car because it's accelerating slowly, but you can't explain to him anything of what you see and can't make any changes to the car.

If you can, get your database into a sandbox environment you have more control over, and then we can see what's happening.

Upvotes: 2

Dwight Reynoldson
Dwight Reynoldson

Reputation: 960

One theory would be that your second column in the order by clause (the one which improves the performance) is being included in a index with the emp_pk column. You can test this by ordering on that column only and see if the query takes 2 seconds (the index is using the primary key to sort) or it will take 30 seconds (a total scan off all the rows is taking place to generate the ordering).

Upvotes: 0

Sergey M
Sergey M

Reputation: 169

Did you try to change query to utilize pagignation over Order By clause?

SELECT Erp_PK 
FROM Erp 
    JOIN Emp ON Emp_PK = Erp_EmpFK 
WHERE Emp_CompanyFK = 2611
Order by a.Erp_PK 
    OFFSET (399000) ROWS FETCH NEXT (1000) ROWS ONLY 

OFFSET values can be a parameters of course

Upvotes: 0

Related Questions