Reputation: 109
I use SQL Server 2014 and need to update a new added datetime type column in one table. There are two tables related (both have > 30 millions of records):
TableA:
CategoryID, itemID, dataCreated, deleted, some other string properties.
This table contains multiples records for each item with different datecreated
.
TableB:
CategoryID, itemID, LatestUpdatedDate (This is the new added column)
both categoryID
and itemID
are part of an index on this table.
To update tableB's LatestUpdatedDate
from table A on matched CategoryID
and ItemID
, I used the following merge statement:
merge [dbo].[TableB] with(HOLDLOCK) as t
using
(
select CategoryID,itemID, max(DateCreated) as LatestUpdatedDate
from dbo.TableA
where TableA.Deleted = 0
group by CategoryID,itemID
) as s on t.CategoryID = s.CategoryID and t.itemID = s.itemID
when matched then
update
set t.LatestUpdatedDate = s.LatestUpdatedDate
when not matched then
insert (CategoryID, itemID, LatestUpdatedDate)
values (s.CategoryID, s.itemID)
Given the fact that millions of records in both table, How can I optimize this script? Or Is there any other way to update the table with better performance?
Note: This is a one-off script and DB is on live, there would be a trigger added to tableA against insert to update the date in tableB in the future.
Upvotes: 5
Views: 38241
Reputation: 4036
As per Optimizing MERGE Statement Performance, the best you can do is:
You may get a performance improvement during MERGE1
by creating an index on TableA
on (Deleted, CategoryID, itemID) INCLUDE(DateCreated)
. However, since this is a one-off operation, the resources (time, CPU, space) required to create this index probably won't offset the performance gains vis-a-vis running the query as-is and relying on your existing index.
Upvotes: 13