Andy
Andy

Reputation: 1293

How to speed up simple UPDATE query with millions of rows?

How can I speed up this rather simple UPDATE query? It's been running for over 5 hours!

I'm basically replacing SourceID in a table by joining on a new table that houses the Old and New IDs. All these fields are VARCHAR(72) and must stay that way.

Pub_ArticleFaculty table has 8,354,474 rows (8.3 million). ArticleAuthorOldNew has 99,326,472 rows (99.3 million) and only the 2 fields you see below.

There are separate non-clustered indexes on all these fields. Is there a better way to write this query to make it run faster?

UPDATE PF
        SET PF.SourceId = AAON.NewSourceId
    FROM AA..Pub_ArticleFaculty PF WITH (NOLOCK)
        INNER JOIN AA2..ArticleAuthorOldNew AAON WITH (NOLOCK)  
                   ON AAON.OldFullSourceId = PF.SourceId

Upvotes: 13

Views: 29381

Answers (4)

Binesh Nambiar C
Binesh Nambiar C

Reputation: 162

I don't have enough points to comment on the question. So I'm adding it as an answer. Can you check the basics

  1. Are there any triggers on the table? If there are, there will be additional overhead when you are updating rows.

  2. Are there indexes on the joining columns?

  3. In other cases, does the system perform well? Verify that the system have enough power.

But 8 million records are not much to run more than 1 minute max if processing properly. An execution time of 5 hrs indicates there is a problem somewhere else.

Upvotes: 0

Vince Horst
Vince Horst

Reputation: 5308

In my experience, looping your update so that it acts on small a numbers of rows each iteration is a good way to go. The ideal number of rows to update each iteration is largely dependent on your environment and the tables you're working with. I usually stick around 1,000 - 10,000 rows per iteration.

Example

SET ROWCOUNT 1000 -- Set the batch size (number of rows to affect each time through the loop).
WHILE (1=1) BEGIN

    UPDATE PF
    SET NewSourceId = 1
    FROM AA..Pub_ArticleFaculty PF WITH (NOLOCK)
            INNER JOIN AA2..ArticleAuthorOldNew AAON WITH (NOLOCK)  
                       ON AAON.OldFullSourceId = PF.SourceId
    WHERE NewSourceId IS NULL -- Only update rows that haven't yet been updated.

    -- When no rows are affected, we're done!
    IF @@ROWCOUNT = 0
        BREAK
END
SET ROWCOUNT 0 -- Reset the batch size to the default (i.e. all rows).
GO

Upvotes: 17

paparazzo
paparazzo

Reputation: 45096

I would

  • Disable the index on PF.SourceId
  • Run the update
  • Then rebuild the index

I don't get the NOLOCK on the table you are updating

UPDATE PF
SET PF.SourceId = AAON.NewSourceId
FROM AA..Pub_ArticleFaculty PF 
INNER JOIN AA2..ArticleAuthorOldNew AAON WITH (NOLOCK)  
        ON AAON.OldFullSourceId = PF.SourceId 
       AND PF.SourceId <> AAON.NewSourceId

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269793

If you are resetting all or almost all of the values, then the update will be quite expensive. This is due to logging and the overhead for the updates.

One approach you can take instead is insert into a temporary table, then truncate, then re-insert:

select pf.col1, pf.col2,  . . . ,
       coalesce(aaon.NewSourceId, pf.sourceid) as SourceId
into temp_pf
from AA..Pub_ArticleFaculty PF LEFT JOIN
     AA2..ArticleAuthorOldNew AAON 
     on AAON.OldFullSourceId = PF.SourceId;

truncate table AA..Pub_ArticleFaculty;

insert into AA..Pub_ArticleFaculty
    select * from temp_pf;

Note: You should either be sure that the columns in the original table match the temporary table or, better yet, list the columns explicitly in the insert.

I should also note that the major benefit is when your recovery mode is simple or bulk-logged. The reason is that logging for the truncate, select into, and insert . . . select is minimal (see here). This savings on the logging can be very significant.

Upvotes: 4

Related Questions