Reputation: 417
Sorry for the long post. We have an extract that runs every 10 minutes or so to transfer data from a source table into our data warehouse.
The code looks something like this:
SELECT
Column1,
Column2,
Column3,
Column4
FROM
dbo.Table
WHERE
LastModifiedDate > @LastRun
OR CreatedDate > @LastRun
OR CancelledDate > @LastRun
OR DeletedDate > @LastRun
The @LastRun
parameter is passed via the SSIS package and contains a the last datetime we ran (so we only ever bring back what we need).
This code runs very slow on the source system, despite only returning 50-70 rows each time.
We have 2 technical teams we liaise on regarding performance issues. One team is suggesting that there are no indexes on the table to support this extract query, and that adding indexes to the date columns will allow the query to run more smoothly.
The other team is saying that the code needs to be re-written as "adding indexes to the date columns will add to the problem as it is too blunt a tool". They have also suggested creating an indexed view for us to extract the data from instead of going to the actual table itself.
What would be the best approach regarding this? Just trying to get opinions really, as I was under the impression that it doesn't matter how the code is written if the columns used to filter have no indexes on them at all?
Appreciate any thoughts!
Upvotes: 0
Views: 42
Reputation: 1269543
OR
s are really bad logic for indexes. If you have a separate index on each column, you could convert this to a union all
:
select . . .
from t
where LastModifiedDate > @LastRun
union all
select . ..
from t
where CreatedDate > @LastRun and
not (LastModifiedDate > @LastRun)
union all
select . ..
from t
where CancelledDate > @LastRun and
not (LastModifiedDate > @LastRun and CreatedDate > @LastRun )
union all
select . ..
from t
where DeletedDate > @LastRun and
not (LastModifiedDate > @LastRun and CreatedDate > @LastRun and CancelledDate > @LastRun);
Note: The logic will be a bit more complicated if the columns are nullable.
This method is also cumbersome, because you need to repeat the query four times and have an index on all four columns. And if SQL Server -- in its infinite wisdom -- decides not to use an index for even one subquery, you are stuck with a worse-performing solution.
And alternative is to use an index on a computed column:
alter table t add most_recent_date as
(case when LastModifiedDate >= CreatedDate and LastModifiedDate >= CancelledDate and LastModifiedDate >= DeletedDate then LastModifiedDate
when CreatedDate >= CancelledDate and CreatedDate >= DeletedDate then CreatedDate
when CancelledDate >= DeletedDate then CancelledDate
else DeletedDate
end) persisted;
create index idx_most_recent_date on t(most_recent_date);
Note that the computed column logic is a wee bit more complicated if any of the dates can be NULL
. If these are set in a trigger, it might be simpler to just modify the trigger to keep the most recent date up-to-date.
Then you can phrase the query as:
where most_recent_date > @LastRun
Upvotes: 1