cuongle
cuongle

Reputation: 75306

Columnstore index - slow performance on offset-fetch query

We have Fact table around 35M rows on Azure database (premium tier), this table has cluster columnstore index enabled in order to boost query performance.

We did pagination (to index on Elastic Search) on Fact table using similar below code:

SELECT *
FROM [SPENDBY].[FactInvoiceDetail]
ORder by id
offset 1000000 rows fetch next 1000 rows only

But this query performs so slow, even over 10 minutes, it's not finished. If we change to use TOP, it works really well and take around 30 seconds:

SELECT TOP 1000 * 
FROM [SPENDBY].[FactInvoiceDetail]
WHERE ID > 1000000 
ORDER BY Id

The estimated execution plan for offset-fetch query:

enter image description here

I am not sure that I understand whether offset-fetch query performs very poorly on cluster columnstore index or not.

This table also have a lot of none-cluster B-tree indexes on foreign keys and one unique index on the Id of Fact table in order to boost performance

This execution plan for offset-fetch query:

https://pastebin.com/BM8MXQMg

Upvotes: 3

Views: 2862

Answers (3)

David Browne - Microsoft
David Browne - Microsoft

Reputation: 89006

There are a few issues here.

1) Ordering BTree index is not a covering index for the paging query.

2) The rows must be reconstructed from the CCI.

3) The offset is large.

Paging queries need a BTree index on the ordering columns to calculate which rows should be returned, and if that BTree index doesn't include all the requested columns, then a row lookup be required for each row. This is the "Nested Loops" operator in the query plan.

But the rows are stored in a CCI which means that each column is in a separate data structure, and reading a single row requires one Logical IO for each column, for each row. That's why this query is especially expensive. And why a CCI is a poor choice for a paging query. A Clustered Index on the ordering columns, or a non-clustered index on the ordering columns with the remaining requested columns included would be much better.

A secondary and smaller issue here is the large offset. SQL has to skip over the offset rows, counting them at it goes. So this will reading the first N pages of the BTree leaf level pages to skip over rows.

Upvotes: 4

TheGameiswar
TheGameiswar

Reputation: 28890

The main issue here is with OFFSET large value..

offset 1000000 rows fetch next 1000 rows only

OFFSet and Fetch works great,when the OFFSET value is small,see below example for more details

SELECT orderid, orderdate, custid, filler
FROM dbo.Orders
ORDER BY orderdate DESC, orderid DESC
OFFSET 50 ROWS FETCH NEXT 10 ROWS ONLY;

I have the order by columns as key columns and columns in select are included..this results in below plan..

key point to observe here is SQLServer ends up reading Offset+fetch (50+10 ) rows and then finally filtering 10 rows

enter image description here

So with your large Offset, you will be ending with 1000000+1000 row reads even with a suitable index,which is very huge

If you can ask ,sql server to filter out 1000 rows immediately after scan,that can help your query..this may be(not tested for your schema) achieved by rewriting your query like below

WITH CLKeys AS
(
 SELECT ID
 FROM yourtable
 ORDER BY ID desc
OFFSET 500000 ROWS FETCH FIRST 10 ROWS ONLY
)
SELECT K.*, O.rest of columns 
FROM CLKeys AS K
CROSS APPLY (SELECT columns needed other than id
FROM yourtable  AS A
WHERE A.id= K.id) AS O
ORDER BY Id desc;

References:
http://sqlmag.com/t-sql/offsetfetch-part-1#comment-25061

Upvotes: 2

Stefanos Zilellis
Stefanos Zilellis

Reputation: 611

this statement:

SELECT TOP 1000 * 
FROM [SPENDBY].[FactInvoiceDetail]
WHERE ID > 1000000 
ORDER BY Id

works entirely with the (clustered?) ID field index (is primary key?) having ID > 1000000 ready

the other statement sorts and SEARCHES for the ID value that will fulfill the offset 1000000 rows

the offset 1000000 rows is not equal to WHERE ID > 1000000 for the optimizer unless there are no gaps to the ID values.

Upvotes: 2

Related Questions