Reputation: 75306
We have Fact table around 35M rows on Azure database (premium tier), this table has cluster columnstore index enabled in order to boost query performance.
We did pagination (to index on Elastic Search) on Fact table using similar below code:
SELECT *
FROM [SPENDBY].[FactInvoiceDetail]
ORder by id
offset 1000000 rows fetch next 1000 rows only
But this query performs so slow, even over 10 minutes, it's not finished. If we change to use TOP
, it works really well and take around 30 seconds:
SELECT TOP 1000 *
FROM [SPENDBY].[FactInvoiceDetail]
WHERE ID > 1000000
ORDER BY Id
The estimated execution plan for offset-fetch query:
I am not sure that I understand whether offset-fetch
query performs very poorly on cluster columnstore index or not.
This table also have a lot of none-cluster B-tree indexes on foreign keys and one unique index on the Id
of Fact table in order to boost performance
This execution plan for offset-fetch query:
Upvotes: 3
Views: 2862
Reputation: 89006
There are a few issues here.
1) Ordering BTree index is not a covering index for the paging query.
2) The rows must be reconstructed from the CCI.
3) The offset is large.
Paging queries need a BTree index on the ordering columns to calculate which rows should be returned, and if that BTree index doesn't include all the requested columns, then a row lookup be required for each row. This is the "Nested Loops" operator in the query plan.
But the rows are stored in a CCI which means that each column is in a separate data structure, and reading a single row requires one Logical IO for each column, for each row. That's why this query is especially expensive. And why a CCI is a poor choice for a paging query. A Clustered Index on the ordering columns, or a non-clustered index on the ordering columns with the remaining requested columns included would be much better.
A secondary and smaller issue here is the large offset. SQL has to skip over the offset rows, counting them at it goes. So this will reading the first N pages of the BTree leaf level pages to skip over rows.
Upvotes: 4
Reputation: 28890
The main issue here is with OFFSET large value..
offset 1000000 rows fetch next 1000 rows only
OFFSet and Fetch works great,when the OFFSET value is small,see below example for more details
SELECT orderid, orderdate, custid, filler
FROM dbo.Orders
ORDER BY orderdate DESC, orderid DESC
OFFSET 50 ROWS FETCH NEXT 10 ROWS ONLY;
I have the order by columns as key columns and columns in select are included..this results in below plan..
key point to observe here is SQLServer ends up reading Offset+fetch (50+10 ) rows and then finally filtering 10 rows
So with your large Offset, you will be ending with 1000000+1000 row reads even with a suitable index,which is very huge
If you can ask ,sql server to filter out 1000 rows immediately after scan,that can help your query..this may be(not tested for your schema) achieved by rewriting your query like below
WITH CLKeys AS
(
SELECT ID
FROM yourtable
ORDER BY ID desc
OFFSET 500000 ROWS FETCH FIRST 10 ROWS ONLY
)
SELECT K.*, O.rest of columns
FROM CLKeys AS K
CROSS APPLY (SELECT columns needed other than id
FROM yourtable AS A
WHERE A.id= K.id) AS O
ORDER BY Id desc;
References:
http://sqlmag.com/t-sql/offsetfetch-part-1#comment-25061
Upvotes: 2
Reputation: 611
this statement:
SELECT TOP 1000 *
FROM [SPENDBY].[FactInvoiceDetail]
WHERE ID > 1000000
ORDER BY Id
works entirely with the (clustered?) ID field index (is primary key?) having ID > 1000000 ready
the other statement sorts and SEARCHES for the ID value that will fulfill the offset 1000000 rows
the offset 1000000 rows is not equal to WHERE ID > 1000000 for the optimizer unless there are no gaps to the ID values.
Upvotes: 2