Jan
Jan

Reputation: 1975

Index scan when filtering on column calculated as DATEADD from column with clustered index

I'm puzzled why I'm getting clustered index seeks on query on calculated column that uses simple DATEADD function (that should be deterministic) over other column which is cluster indexed.

Am I able to keep the nonmaterialized calculated column (since this is for existing huge legacy data) and be able to filter by this column and have the sql hit the index? I need to filter on the calculated column (as I cannot change the queries and they need to query with proper offset).

Simplistic example definition:

-- table with calculated column
CREATE TABLE [dbo].[_test](
    [Dt] [datetime] NOT NULL
) ON [PRIMARY]
GO

--clustered index
CREATE CLUSTERED INDEX [Idx] ON [dbo].[_test]
(
    [Dt] ASC
) ON [PRIMARY]
GO



--simulation of legacy data
DECLARE @rdate DATE
DECLARE @startLoopID INT = 1
DECLARE @endLoopID INT = 1000

WHILE @startLoopID <= @endLoopID
BEGIN
    SET @rdate = DATEADD(Hour, ABS(CHECKSUM(NEWID()) % (365 * 24) ), '2020-01-01');
    SET @startLoopID = @startLoopID + 1;

    INSERT INTO [_test] (Dt)
    VALUES (@rdate);
END

--adding the calculated column with proper offset
alter table _test ADD [Dt_2] AS  DATEADD(MINUTE, 300, Dt) 

Example execution: DECLARE @EndTime DATETIME2(7) = '2020-07-10 00:00:00.000'

DECLARE @StartTime DATETIME2(7) = DATEADD(day, -12, @EndTime)

--select is performing seek
select * from _test
where 
 Dt > @StartTime AND Dt < @EndTime
 
--select is performing scan
select * from _test
where 
 Dt_2 > @StartTime AND Dt_2 < @EndTime

Query plan:

Query Plan

When filtering on Dt column I get expected seek. When filtering on Dt_2 column - which is calculated from Dt via deterministic function - I get index scan. In a real scenario with large amount of data this is causing huge performance hit.

Upvotes: 0

Views: 613

Answers (1)

allmhuran
allmhuran

Reputation: 4454

You need to index the calculated column in order for SQL to be able to perform a seek. The calculated column only gets evaluated at select-time unless it is persisted or indexed. Even if it was marked as persisted, it would still need to be scanned without an index. The fact that the expression is deterministic and precise means it can be indexed, but you still need to add the index.

Your comment below indicates you are unable to add an index to the computed column. But in this particular case, you don't actually need one anyway in order to perform the same query, because your calculated column is adding a constant offset to your original column. Therefore, the constant expression can be moved to the right hand side of the comparison operation in the query itself, and SQL will still be able to use the original clustered index.

That is to say, instead of creating a new column which is equal to Dt plus some constant, you can subtract the constant from the RHS of the inequality. Instead of:

alter table _test ADD [Dt_2] AS  DATEADD(MINUTE, 300, Dt);

DECLARE @EndTime DATETIME2(7) = '2020-07-10 00:00:00.000'
DECLARE @StartTime DATETIME2(7) = DATEADD(day, -12, @EndTime);

select * from _test
where 
 Dt_2 > @StartTime AND Dt_2 < @EndTime;

You can use:

--subtract 300 minutes from the @endTime parameter instead of adding 300 minutes to every value of Dt

DECLARE @EndTime DATETIME2(7) = dateadd(minute, -300, '2020-07-10 00:00:00.000');
DECLARE @StartTime DATETIME2(7) = DATEADD(day, -12, @EndTime);

select * from _test
where 
 Dt > @StartTime
AND Dt < @EndTime;

Upvotes: 1

Related Questions