Jess8766
Jess8766

Reputation: 417

Understanding SQL Server Indexes

Sorry for the long post. We have an extract that runs every 10 minutes or so to transfer data from a source table into our data warehouse.

The code looks something like this:

SELECT 
    Column1,
    Column2,
    Column3,
    Column4
FROM 
    dbo.Table
WHERE
    LastModifiedDate > @LastRun
    OR CreatedDate > @LastRun
    OR CancelledDate > @LastRun
    OR DeletedDate > @LastRun

The @LastRun parameter is passed via the SSIS package and contains a the last datetime we ran (so we only ever bring back what we need).

This code runs very slow on the source system, despite only returning 50-70 rows each time.

We have 2 technical teams we liaise on regarding performance issues. One team is suggesting that there are no indexes on the table to support this extract query, and that adding indexes to the date columns will allow the query to run more smoothly.

The other team is saying that the code needs to be re-written as "adding indexes to the date columns will add to the problem as it is too blunt a tool". They have also suggested creating an indexed view for us to extract the data from instead of going to the actual table itself.

What would be the best approach regarding this? Just trying to get opinions really, as I was under the impression that it doesn't matter how the code is written if the columns used to filter have no indexes on them at all?

Appreciate any thoughts!

Upvotes: 0

Views: 42

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269543

ORs are really bad logic for indexes. If you have a separate index on each column, you could convert this to a union all:

select . . .
from t
where LastModifiedDate > @LastRun
union all
select . ..
from t
where CreatedDate > @LastRun and
      not (LastModifiedDate > @LastRun)
union all
select . ..
from t
where CancelledDate > @LastRun and
      not (LastModifiedDate > @LastRun and CreatedDate > @LastRun )
union all
select . ..
from t
where DeletedDate > @LastRun and
      not (LastModifiedDate > @LastRun and CreatedDate > @LastRun and CancelledDate > @LastRun);

Note: The logic will be a bit more complicated if the columns are nullable.

This method is also cumbersome, because you need to repeat the query four times and have an index on all four columns. And if SQL Server -- in its infinite wisdom -- decides not to use an index for even one subquery, you are stuck with a worse-performing solution.

And alternative is to use an index on a computed column:

alter table t add most_recent_date as
    (case when LastModifiedDate >= CreatedDate and LastModifiedDate >= CancelledDate and LastModifiedDate >= DeletedDate then LastModifiedDate
          when CreatedDate >= CancelledDate and CreatedDate >= DeletedDate then CreatedDate
          when CancelledDate >= DeletedDate then CancelledDate
          else DeletedDate
     end) persisted;

create index idx_most_recent_date on t(most_recent_date);

Note that the computed column logic is a wee bit more complicated if any of the dates can be NULL. If these are set in a trigger, it might be simpler to just modify the trigger to keep the most recent date up-to-date.

Then you can phrase the query as:

where most_recent_date > @LastRun

Upvotes: 1

Related Questions