Reputation: 1293
How can I speed up this rather simple UPDATE query? It's been running for over 5 hours!
I'm basically replacing SourceID in a table by joining on a new table that houses the Old and New IDs. All these fields are VARCHAR(72) and must stay that way.
Pub_ArticleFaculty table has 8,354,474 rows (8.3 million). ArticleAuthorOldNew has 99,326,472 rows (99.3 million) and only the 2 fields you see below.
There are separate non-clustered indexes on all these fields. Is there a better way to write this query to make it run faster?
UPDATE PF
SET PF.SourceId = AAON.NewSourceId
FROM AA..Pub_ArticleFaculty PF WITH (NOLOCK)
INNER JOIN AA2..ArticleAuthorOldNew AAON WITH (NOLOCK)
ON AAON.OldFullSourceId = PF.SourceId
Upvotes: 13
Views: 29381
Reputation: 162
I don't have enough points to comment on the question. So I'm adding it as an answer. Can you check the basics
Are there any triggers on the table? If there are, there will be additional overhead when you are updating rows.
Are there indexes on the joining columns?
In other cases, does the system perform well? Verify that the system have enough power.
But 8 million records are not much to run more than 1 minute max if processing properly. An execution time of 5 hrs indicates there is a problem somewhere else.
Upvotes: 0
Reputation: 5308
In my experience, looping your update so that it acts on small a numbers of rows each iteration is a good way to go. The ideal number of rows to update each iteration is largely dependent on your environment and the tables you're working with. I usually stick around 1,000 - 10,000 rows per iteration.
Example
SET ROWCOUNT 1000 -- Set the batch size (number of rows to affect each time through the loop).
WHILE (1=1) BEGIN
UPDATE PF
SET NewSourceId = 1
FROM AA..Pub_ArticleFaculty PF WITH (NOLOCK)
INNER JOIN AA2..ArticleAuthorOldNew AAON WITH (NOLOCK)
ON AAON.OldFullSourceId = PF.SourceId
WHERE NewSourceId IS NULL -- Only update rows that haven't yet been updated.
-- When no rows are affected, we're done!
IF @@ROWCOUNT = 0
BREAK
END
SET ROWCOUNT 0 -- Reset the batch size to the default (i.e. all rows).
GO
Upvotes: 17
Reputation: 45096
I would
I don't get the NOLOCK
on the table you are updating
UPDATE PF
SET PF.SourceId = AAON.NewSourceId
FROM AA..Pub_ArticleFaculty PF
INNER JOIN AA2..ArticleAuthorOldNew AAON WITH (NOLOCK)
ON AAON.OldFullSourceId = PF.SourceId
AND PF.SourceId <> AAON.NewSourceId
Upvotes: 1
Reputation: 1269793
If you are resetting all or almost all of the values, then the update
will be quite expensive. This is due to logging and the overhead for the updates.
One approach you can take instead is insert
into a temporary table, then truncate, then re-insert:
select pf.col1, pf.col2, . . . ,
coalesce(aaon.NewSourceId, pf.sourceid) as SourceId
into temp_pf
from AA..Pub_ArticleFaculty PF LEFT JOIN
AA2..ArticleAuthorOldNew AAON
on AAON.OldFullSourceId = PF.SourceId;
truncate table AA..Pub_ArticleFaculty;
insert into AA..Pub_ArticleFaculty
select * from temp_pf;
Note: You should either be sure that the columns in the original table match the temporary table or, better yet, list the columns explicitly in the insert
.
I should also note that the major benefit is when your recovery mode is simple or bulk-logged. The reason is that logging for the truncate, select into
, and insert . . . select
is minimal (see here). This savings on the logging can be very significant.
Upvotes: 4