Reputation: 10614
I have a table, Customers, that has seven columns: ID, CustomerName, Address1, Address2, City, State, PostCode. I am trying to do cursor pagination via an API over the result set but the user can determine the sorting column. There is no unique constraint on any field but the ID field (which is an auto-increment). When I do the sorting on the ID field, I am able to page through the entire dataset without problem. For example, my first call to the table is
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
ORDER BY
c.Id desc
I take the 10 results, set the 11th as my cursor and then my next query would be
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
where ID <= 999990
ORDER BY
c.Id desc
And so on and so forth.
If I page via a column that isn't quite as straightforward, I run into a perplexing problem:
Here is a resultset where it is ordered by the CustomerName.
So in this case, I would set my cursor to 'Customer 99999' and my SQL query would be
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
WHERE CustomerName <= 'Customer 99999'
ORDER BY
C.CustomerName DESC
and I'd get the following results:
That works fine when the results don't have duplicates in the columns. But when I sort via City, there are a lot of rows with the same city, so my thinking was that I should sort by two keys at all times: inputted column and then a unique key (ID). So my SQL would look like this:
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
ORDER BY
C.City DESC, C.ID DESC
So, now my logic would be to keep track of the 11th row's sorted column value and the ID. In this case, I'd input City-99,998999 as my cursor. So my next query would be:
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
where c.city <= 'City-99' and ID <= 998999
ORDER BY
C.City DESC, C.ID DESC
And that works. But if I apply the same logic to a dataset that doesn't have repeating values, the logic doesn't hold.
If I were to take the 11th values from this dataset which is sorted via CustomerName:
And take the 11th rows as my cursor, my query would be the following:
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
where c.CustomerName <= 'Customer 99999' and ID <= 99999
ORDER BY
C.CustomerName DESC, C.ID DESC
Is there a method I am not thinking about to make a straightforward rule for the efficient moving of the cursor? It seems like sometimes doing two keys is beneficial when there are repeating values in the selected column but not beneficial when there are no repeating values.
Upvotes: 0
Views: 465
Reputation: 10614
I have been able to coble together a solution that is a combination of @The Impaler's answer and the information use-the-index-luke.com's website, specifically this tidbit:
SELECT *
FROM ( SELECT *
FROM sales
WHERE sale_date <= ?
AND NOT (sale_date = ? AND sale_id >= ?)
ORDER BY sale_date DESC, sale_id DESC
)
WHERE rownum <= 10
I took two options, one using offset / limit and one using the cursor based approach I was looking for.
The poor performing SQL was:
select c.* FROM customers c order by city asc, id asc offset 999989 rows FETCH NEXT 11 ROWS ONLY;
When I combine the prior answers and the website suggestions together, I came up with this SQL:
SELECT top(10) * FROM ( SELECT * FROM customers WHERE City <= 'City-99'
AND NOT (City = 'City-99' AND ID >= 999999 )) t ORDER BY City DESC, ID DESC
The SQL Server profiler view was even more black and white.
CPU / Reads / Duration
Offset / Limit 2049 22163 553
Cursor 1626 22163 274
So, while I didn't reduce the reads between the statements, the CPU and duration were sped up significantly.
I was able to sort by any column, use the combination key of the sort column and the ID column, and get pretty good performance without indexing any of the columns.
Speaking of indexing columns, I went back and put an index on the City column to see if it would make a difference:
CPU / Reads / Duration
Offset / Limit 2156 22163 586
Cursor 0 48 48
Running the same SQL above, the improvements become even more drastic.
And I recognize that proper database hygiene would be to know ahead of time what columns to index, but I'm releasing systems that aren't aware of how the customers will use them initially, which is why I was pushing for something better than Limit / offset. My hope is that we capture the high frequency of user sorting events and then recommend an index if one doesn't exist for the primarily used columns the customer sorts by.
Upvotes: 0
Reputation: 48865
Always sort using a "unique key" when implementing Backend Pagination.
The trivial case is when you sort by ID
, that is a key.
When sorting by another criteria that is not unique, then make it unique. For example, when sorting by City
add ID
at the end of the sorting terms, as in:
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
ORDER BY
C.City DESC, C.ID DESC -- added ID here
By adding ID
at the end of the ORDER BY
clause, you make the sorting columns unique. And by doing so, the backend pagination will work as expected when retrieving the next page and so forth, as in:
SELECT TOP(11)
c.*
FROM [Customers] AS [c]
where c.city < 'City-99' or c.city = 'City-99' and ID <= 998999
ORDER BY
C.City DESC, C.ID DESC
Since SQL Server does not implement "tuple inequality" the predicate in the WHERE
clause looks clunky. However, it will work well.
Note: A tuple inequality can rephrase the search predicate as (c.city, c.id) <= ('City-99', 998999)
. They are implemented in DB2, PostgreSQL, MySQL, MariaDB. Unfortunately, they are not implemented in Oracle or SQL Server. They allow the query to be optimized in much better way, if performance is a concern.
Upvotes: 2
Reputation: 5167
Pagination in SQL Server can be done by two differents ways in standard ISO SQL:
1) with ORDER BY only :
SELECT *
FROM Customers
ORDER BY City DESC, ID DESC OFFSET 0 ROWS FETCH NEXT 11 ROWS ONLY;
To paginate with a 10 rows page and having page 7, you can execute this query :
SELECT *
FROM Customers
ORDER BY City DESC, ID DESC OFFSET 1 + 10 * 7 ROWS FETCH NEXT 10 ROWS ONLY;
More generically you can use SQL variables :
DECLARE @PAGE INT = 7, @ROWS INT = 10;
SELECT *
FROM Customers
ORDER BY City DESC, ID DESC OFFSET 1 + @ROWS * @PAGE ROWS FETCH NEXT @ROWS ROWS ONLY;
If OFFSET / FETCH is not available in your SQL Server version...
2) Pagination with ROW_NUMBER() window function:
WITH
QUERY AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY City DESC, ID DESC) AS ROWNUM
FROM Customers
)
SELECT *
FROM QUERY
WHERE ROWNUM <= 11
ORDER BY 1 City DESC, ID DESC;
To paginate with a 10 rows page and having page 7, you can execute this query :
WITH
QUERY AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY City DESC, ID DESC) AS ROWNUM
FROM Customers
)
SELECT *
FROM QUERY
WHERE ROWNUM BETWEEN 1 + 10 * 7 AND (7 + 1) * 10
ORDER BY 1 City DESC, ID DESC;
And generically :
DECLARE @PAGE INT = 7, @ROWS INT = 10;
WITH
QUERY AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY City DESC, ID DESC) AS ROWNUM
FROM Customers
)
SELECT *
FROM QUERY
WHERE ROWNUM BETWEEN 1 + @ROWS * @PAGE AND (@PAGE + 1) * @ROWS
ORDER BY 1 City DESC, ID DESC;
By the way TOP is a non standard operator specific to MS SQL Server and should not be used if some SQL ISO standard can do the same or best...
Upvotes: 0