Reputation: 6132
I have a merge statement that needs to compare on many columns. The source table has 26,000 rows. The destination table has several million rows. The desintation table only has a typical Primary Key index on an int-type column.
I did some selects with group by to count the number of unique values in the source.
The test part of the Merge is
Merge Into desttable
Using #temptable
On
(
desttable.ColumnA = #temptable.ColumnA
and
desttable.ColumnB = #temptable.ColumnB
and
desttable.ColumnC = #temptable.ColumnC
and
desttable.ColumnD = #temptable.ColumnD
and
desttable.ColumnE = #temptable.ColumnE
and
desttable.ColumnF = #temptable.ColumnF
)
When Not Matched Then Insert Values (.......)
-- ColumnA: 167 unique values in #temptable
-- ColumnB: 1 unique values in #temptable
-- ColumnC: 13 unique values in #temptable
-- ColumnD: 89 unique values in #temptable
-- ColumnE: 550 unique values in #temptable
-- ColumnF: 487 unique values in #temptable
-- ColumnA: 3690 unique values in desttable
-- ColumnB: 3 unique values (plus null is possible) in desttable
-- ColumnC: 1113 unique values in desttable
-- ColumnD: 2662 unique values in desttable
-- ColumnE: 1770 unique values in desttable
-- ColumnF: 1480 unique values in desttable
The merge right now takes a very, very long time. I think I need to change my primary key but am not sure what the best tactic might be. 26,000 rows can be inserted on the first merge, but subsequent merges might only have ~2,000 inserts to do. Since I have no indexes and only a simple PK, everything is slow. :)
Can anyone point out how to make this better?
Thanks!
Upvotes: 0
Views: 2538
Reputation: 730
My suggestion is if you only need to run it once, then Merge statement is acceptable if time is not that critical. But, if you're going to use the script more often, I think it'll be better if you do it step by step instead of using the Merge statement. Step by step, like creating your own select, insert, update, delete statements in order to attain the goal. With this you'll have more control almost on everything(query optimization, indexing, etc...)
In your case, probably separating the 6 where criteria might be more efficient than combining them all at once. Downside is you'll have longer script.
Upvotes: 1
Reputation: 755541
Well, an obvious candidate would be an index on the columns you use to do your matching in the MERGE
statement - do you have an index on (ColumnA, ColumnB, ColumnC, ColumnD, ColumnE, ColumnF)
on your destitation table??
This tuple of columns is being used to determine whether or not a row from your source table already exists in the database. If you don't have that index nor any other usable index in place, you get a table scan on the large destination table for each row in your source table, basically.
If not: I would try to add it and then see how the runtime behavior changes. Does the MERGE
now run a little less than a very, very long time??
Upvotes: 2