Reputation: 5684
I have several tables with 30+ million records that I need to perform paginated queries on with the following structure:
CREATE TABLE A (
ID INT NOT NULL,
[DATE] DATETIME NOT NULL,
PRIMARY KEY (ID, [DATE]) --Clustered
)
When dealing with data this size it's impossible to use OFFSET FETCH
as sql server needs to first read through all the records 'till it reaches the 'page' I want.
So for each table I have a secondary 'index' table that I create which has a ROWNUMBER
column which I create using ROW_NUMBER()
to order the records:
CREATE TABLE A_Index (
ROWNUMBER INT NOT NULL,
ID INT NOT NULL,
[DATE] DATETIME NOT NULL,
)
Using this index table we perform our paginated queries as such:
SELECT A.*
FROM A
INNER JOIN A_Index ON A.ID = A_Index.ID AND A.[DATE] = A_Index.[DATE]
WHERE A_Index.ROWNUMBER > 10000000 AND A_Index.ROWNUMBER <= 10100000
So my question is, what is the best index to create on this 'index' table? I've opted for CLUSTERED indexes but can't decide if the ROWNUMBER
column should be the first column in the index or the last. I've looked at the query plan for both and got a mixed bag of results.
--Option 1: ROWNUMBER is the first column in the index:
CREATE CLUSTERED INDEX IDX_ROWNUMBER ON A_Index(ROWNUMBER, ID, [DATE])
--Option 2: ROWNUMBER is the last column in the index:
CREATE CLUSTERED INDEX IDX_ROWNUMBER ON A_Index(ID, [DATE], ROWNUMBER)
With option 1 I've seen needless, at least in my view, sorting of the ID and DATE columns. Whereas Option 2 usually takes longer to start returning results...
Thanks.
Upvotes: 1
Views: 227
Reputation: 520968
I will suggest two possible indices, and then discuss when each one might make more sense. First index:
CREATE INDEX idx1 ON A_Index (ID, [DATE], ROWNUMBER);
This index would speed up the join back to your original table. For each joined record (from both tables), the third index column ROWNUMBER
could then be used to satisfy the range condition in your WHERE
clause. This works because ROWNUMBER
would be sorted ascending, so SQL Server would only have to scan some range.
Second index:
CREATE INDEX idx2 ON A_Index (ROWNUMBER, ID, [DATE]);
This index first splits on ROWNUMBER
, which would let SQL Server narrow down the index to only those records with matching row number values. After this, it would have scan the index to make the join conditions.
The second index would make more sense if you expect your row number range to be relatively narrow, say a few dozen or hundred values only. In this case, the second index would throw away almost all of the table, leaving behind only a few values to scan.
The first index would make more sense if you expect your row number range to be fairly large. In this case, speeding up the join would take higher priority than scanning the row number values.
Upvotes: 1