Rory
Rory

Reputation: 969

Why is columnstore index not being used

I have a non-clustered columnstore index on all columns a 40m record non-memory optimized table on SQL Server 2016 Enterprise Edition.

A query forcing the use of the columnstore index will perform significantly faster but the optimizer continues to choose to use the clustered index and other non-clustered indexes. I have lots of available RAM and am using appropriate queries against a dimensional model.

Why won't the optimizer choose the columnstoreindex? And how can I encourage its use (without using a hint)?

Here is a sample query not using columnstore:

SELECT
  COUNT(*),
  SUM(TradeTurnover),
  SUM(TradeVolume)
FROM DWH.FactEquityTrade e
--with (INDEX(FactEquityTradeNonClusteredColumnStoreIndex))
JOIN DWH.DimDate d
  ON e.TradeDateId = d.DateId
 JOIN DWH.DimInstrument i
  ON i.instrumentid = e.instrumentid
WHERE d.DateId >= 20160201
AND i.instrumentid = 2

It takes 7 seconds without hint and a fraction of a second with the hint. The query plan without the hint is here. The query plan with the hint is here.

The create statement for the columnstore index is:

CREATE NONCLUSTERED COLUMNSTORE INDEX [FactEquityTradeNonClusteredColumnStoreIndex] ON [DWH].[FactEquityTrade]
(
    [EquityTradeID],
    [InstrumentID],
    [TradingSysTransNo],
    [TradeDateID],
    [TradeTimeID],
    [TradeTimestamp],
    [UTCTradeTimeStamp],
    [PublishDateID],
    [PublishTimeID],
    [PublishedDateTime],
    [UTCPublishedDateTime],
    [DelayedTradeYN],
    [EquityTradeJunkID],
    [BrokerID],
    [TraderID],
    [CurrencyID],
    [TradePrice],
    [BidPrice],
    [OfferPrice],
    [TradeVolume],
    [TradeTurnover],
    [TradeModificationTypeID],
    [InColumnStore],
    [TradeFileID],
    [BatchID],
    [CancelBatchID]
)
WHERE ([InColumnStore]=(1))
WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
GO

Update. Plan using Count(EquityTradeID) instead of Count(*) and with hint included

Upvotes: 12

Views: 3417

Answers (3)

Steven Hibble
Steven Hibble

Reputation: 553

You're asking SQL Server to choose a complicated query plan over a simple one. Note that when using the hint, SQL Server has to concatenate the columnstore index with a rowstore non-clustered index (IX_FactEquiteTradeInColumnStore). When using just the rowstore index, it can do a seek (I assume TradeDateId is the leading column on that index). It does still have to do a key lookup, but it's simpler.

I can see two options to get this behavior without a hint:

First, remove InColumnStore from the columnstore index definition and cover the entire table. That's what you're asking from the columnstore - to cover everything.

If that's not possible, you can use a UNION ALL to explicitly split the data:

WITH workaround
     AS (
         SELECT TradeDateId
              , instrumentid
              , TradeTurnover
              , TradeVolume
         FROM DWH.FactEquityTrade
         WHERE InColumnStore = 1
         UNION ALL
         SELECT TradeDateId
              , instrumentid
              , TradeTurnover
              , TradeVolume
         FROM DWH.FactEquityTrade
         WHERE InColumnStore = 0 -- Assuming this is a non-nullable BIT
        )
     SELECT COUNT(*)
          , SUM(TradeTurnover)
          , SUM(TradeVolume)
     FROM workaround e
          JOIN DWH.DimDate d
            ON e.TradeDateId = d.DateId
          JOIN DWH.DimInstrument i
            ON i.instrumentid = e.instrumentid
     WHERE d.DateId >= 20160201
           AND i.instrumentid = 2;

Upvotes: 6

Asromi rOmi
Asromi rOmi

Reputation: 199

Try this one: Bridge your query

Select * 
Into #DimDate
From DWH.DimDate
WHERE DateId >= 20160201

Select  COUNT(1), SUM(TradeTurnover), SUM(TradeVolume)
From DWH.FactEquityTrade e
Inner Join DWH.DimInstrument i ON i.instrumentid = e.instrumentid 
     And i.instrumentid = 2
Left Join #DimDate d ON e.TradeDateId = d.DateId

How fast this query running ?

Upvotes: -1

Vladimir Baranov
Vladimir Baranov

Reputation: 32695

Your index is a filtered index (it has a WHERE predicate).

Optimizer would use such index only when the query's WHERE matches the index's WHERE. This is true for classic indexes and most likely true for columnstore indexes. There can be other limitations when optimizer would not use filtered index.

So, either add WHERE ([InColumnStore]=(1)) to your query, or remove it from the index definition.

You said in the comments: "the InColumnStore filter is for efficiency when loading data. For all tests so far the filter covers 100% of all rows". Does "all rows" here mean "all rows of the whole table" or just "all rows of the result set"? Anyway, most likely optimizer doesn't know that (even though it could have derived that from statistics), which means that the plan which uses such index has to explicitly do extra checks/lookups, which optimizer considers too expensive.

Here are few articles on this topic:

Why isn’t my filtered index being used? by Rob Farley

Optimizer Limitations with Filtered Indexes by Paul White.

An Unexpected Side-Effect of Adding a Filtered Index by Paul White.

How filtered indexes could be a more powerful feature by Aaron Bertrand, see the section Optimizer Limitations.

Upvotes: 3

Related Questions