Josh
Josh

Reputation: 10614

How can I do cursor pagination with string based columns and integer values?

I have a table, Customers, that has seven columns: ID, CustomerName, Address1, Address2, City, State, PostCode. I am trying to do cursor pagination via an API over the result set but the user can determine the sorting column. There is no unique constraint on any field but the ID field (which is an auto-increment). When I do the sorting on the ID field, I am able to page through the entire dataset without problem. For example, my first call to the table is

SELECT TOP(11) 
c.*
FROM [Customers] AS [c]
ORDER BY 
c.Id desc

enter image description here I take the 10 results, set the 11th as my cursor and then my next query would be

SELECT TOP(11) 
    c.*
FROM [Customers] AS [c]
where ID <= 999990
ORDER BY 
c.Id desc

Second Cursor

And so on and so forth.

If I page via a column that isn't quite as straightforward, I run into a perplexing problem:

Here is a resultset where it is ordered by the CustomerName.

Third Cursor

So in this case, I would set my cursor to 'Customer 99999' and my SQL query would be

SELECT TOP(11) 
    c.*
FROM [Customers] AS [c]
WHERE CustomerName <= 'Customer 99999'
ORDER BY 
C.CustomerName DESC

and I'd get the following results: Top 11 customers by name

That works fine when the results don't have duplicates in the columns. But when I sort via City, there are a lot of rows with the same city, so my thinking was that I should sort by two keys at all times: inputted column and then a unique key (ID). So my SQL would look like this:

SELECT TOP(11) 
    c.*
FROM [Customers] AS [c]
ORDER BY 
C.City DESC, C.ID DESC

Top 11 Customers by City

So, now my logic would be to keep track of the 11th row's sorted column value and the ID. In this case, I'd input City-99,998999 as my cursor. So my next query would be:

SELECT TOP(11) 
    c.*
FROM [Customers] AS [c]
where c.city <= 'City-99' and ID <= 998999
ORDER BY 
C.City DESC, C.ID DESC

And that works. But if I apply the same logic to a dataset that doesn't have repeating values, the logic doesn't hold.

If I were to take the 11th values from this dataset which is sorted via CustomerName:

Trouble Dataset

And take the 11th rows as my cursor, my query would be the following:

SELECT TOP(11) 
    c.*
FROM [Customers] AS [c]
where c.CustomerName <= 'Customer 99999' and ID <= 99999
ORDER BY 
C.CustomerName DESC, C.ID DESC

Is there a method I am not thinking about to make a straightforward rule for the efficient moving of the cursor? It seems like sometimes doing two keys is beneficial when there are repeating values in the selected column but not beneficial when there are no repeating values.

Upvotes: 0

Views: 465

Answers (3)

Josh
Josh

Reputation: 10614

I have been able to coble together a solution that is a combination of @The Impaler's answer and the information use-the-index-luke.com's website, specifically this tidbit:

SELECT *
  FROM ( SELECT *
           FROM sales
          WHERE sale_date <= ?
            AND NOT (sale_date = ? AND sale_id >= ?)
          ORDER BY sale_date DESC, sale_id DESC
       )
 WHERE rownum <= 10

I took two options, one using offset / limit and one using the cursor based approach I was looking for.

The poor performing SQL was:

select c.* FROM customers c order by city asc, id asc offset 999989 rows FETCH NEXT 11 ROWS ONLY;

Poor limit / offset

When I combine the prior answers and the website suggestions together, I came up with this SQL:

   SELECT top(10) * FROM ( SELECT * FROM customers   WHERE City <= 'City-99'
         AND NOT (City = 'City-99'  AND ID >= 999999 )) t  ORDER BY City DESC, ID DESC

Good performance

The SQL Server profiler view was even more black and white.

                CPU / Reads / Duration
Offset / Limit 2049  22163    553
Cursor         1626  22163    274

So, while I didn't reduce the reads between the statements, the CPU and duration were sped up significantly.

I was able to sort by any column, use the combination key of the sort column and the ID column, and get pretty good performance without indexing any of the columns.

Speaking of indexing columns, I went back and put an index on the City column to see if it would make a difference:

                CPU / Reads / Duration
Offset / Limit 2156  22163    586
Cursor            0     48    48

Running the same SQL above, the improvements become even more drastic.

And I recognize that proper database hygiene would be to know ahead of time what columns to index, but I'm releasing systems that aren't aware of how the customers will use them initially, which is why I was pushing for something better than Limit / offset. My hope is that we capture the high frequency of user sorting events and then recommend an index if one doesn't exist for the primarily used columns the customer sorts by.

Upvotes: 0

The Impaler
The Impaler

Reputation: 48865

Always sort using a "unique key" when implementing Backend Pagination.

The trivial case is when you sort by ID, that is a key.

When sorting by another criteria that is not unique, then make it unique. For example, when sorting by City add ID at the end of the sorting terms, as in:

SELECT TOP(11) 
    c.*
FROM [Customers] AS [c]
ORDER BY 
C.City DESC, C.ID DESC -- added ID here

By adding ID at the end of the ORDER BY clause, you make the sorting columns unique. And by doing so, the backend pagination will work as expected when retrieving the next page and so forth, as in:

SELECT TOP(11) 
    c.*
FROM [Customers] AS [c]
where c.city < 'City-99' or c.city = 'City-99' and ID <= 998999
ORDER BY 
C.City DESC, C.ID DESC

Since SQL Server does not implement "tuple inequality" the predicate in the WHERE clause looks clunky. However, it will work well.

Note: A tuple inequality can rephrase the search predicate as (c.city, c.id) <= ('City-99', 998999). They are implemented in DB2, PostgreSQL, MySQL, MariaDB. Unfortunately, they are not implemented in Oracle or SQL Server. They allow the query to be optimized in much better way, if performance is a concern.

Upvotes: 2

SQLpro
SQLpro

Reputation: 5167

Pagination in SQL Server can be done by two differents ways in standard ISO SQL:

1) with ORDER BY only :

SELECT *
FROM   Customers
ORDER  BY City DESC, ID DESC OFFSET 0 ROWS FETCH NEXT 11 ROWS ONLY;

To paginate with a 10 rows page and having page 7, you can execute this query :

SELECT *
FROM   Customers
ORDER  BY City DESC, ID DESC OFFSET 1 + 10 * 7  ROWS FETCH NEXT 10 ROWS ONLY;

More generically you can use SQL variables :

DECLARE @PAGE INT = 7, @ROWS INT = 10;
SELECT *
FROM   Customers
ORDER  BY City DESC, ID DESC OFFSET 1 + @ROWS * @PAGE  ROWS FETCH NEXT @ROWS ROWS ONLY;

If OFFSET / FETCH is not available in your SQL Server version...

2) Pagination with ROW_NUMBER() window function:

WITH 
QUERY AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY City DESC, ID DESC) AS ROWNUM
FROM   Customers
)
SELECT * 
FROM   QUERY
WHERE  ROWNUM <= 11
ORDER  BY 1 City DESC, ID DESC;

To paginate with a 10 rows page and having page 7, you can execute this query :

WITH 
QUERY AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY City DESC, ID DESC) AS ROWNUM
FROM   Customers
)
SELECT * 
FROM   QUERY
WHERE  ROWNUM BETWEEN 1 + 10 * 7 AND (7 + 1) * 10
ORDER  BY 1 City DESC, ID DESC;

And generically :

DECLARE @PAGE INT = 7, @ROWS INT = 10;
WITH 
QUERY AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY City DESC, ID DESC) AS ROWNUM
FROM   Customers
)
SELECT * 
FROM   QUERY
WHERE  ROWNUM BETWEEN 1 + @ROWS * @PAGE AND (@PAGE + 1) * @ROWS
ORDER  BY 1 City DESC, ID DESC;

By the way TOP is a non standard operator specific to MS SQL Server and should not be used if some SQL ISO standard can do the same or best...

Upvotes: 0

Related Questions