HappyTown
HappyTown

Reputation: 6514

How to avoid UPDATE statement locking out the entire table when updating large number of records

I am fairly new to locks and hints.

I have a table with very frequent SELECT and INSERT operations. The table has 11 million records.

I have added a new column to it and I need to copy over the data from an existing column in the same table to the new column.

I am planning to use ROWLOCK hint to avoid escalating locks to table level locks and blocking out all other operations on the table. For example:

UPDATE 
    SomeTable WITH (ROWLOCK)
SET
    NewColumn = OldColumn

Questions:

  1. Would a NOLOCK instead of ROWLOCK? Note, once the records are inserted in the table, the value for OldColumn does not change, so NOLOCK would not cause dirty reads.
  2. Does NOLOCK even make sense in this case, because the SQL Server would have to anyways get update locks for UPDATE.
  3. Is there a better way of achieving this?

I know hints are to be avoided and SQL Server usually makes smarter choices, but I don't want to get the table locked out during this update.

Upvotes: 17

Views: 49414

Answers (3)

pacreely
pacreely

Reputation: 1931

Try to update in batches.

DECLARE @Batch INT = 1000
DECLARE @Rowcount INT = @Batch


WHILE @Rowcount > 0
    BEGIN
        ;WITH CTE AS 
        (
            SELECT TOP (@Batch) NewColumn,OldColumn 
            FROM SomeTable 
            WHERE NewColumn <> OldColumn
                  OR (NewColumn IS NULL AND OldColumn IS NOT NULL)
        )
        UPDATE cte
            SET NewColumn = OldColumn;
        SET @Rowcount = @@ROWCOUNT
    END

Upvotes: 19

Jen R
Jen R

Reputation: 1535

We recently had a case where we wanted to do something similar, but slowly over days (updating only a certain number of records per run, and only during certain hours). The most recent data was fine, but millions of rows of older data needed to be updated. Our data table looks something like this:

Create Table FileContent
(
FileContent varchar(max),
File_PK bigint,
NewFileContent varchar(max)
)

And we only needed to update certain rows, but millions of them. We created a table to store our progress so we could use a scheduled job to iterate through and update the main table, then populated this table with the primary keys of the main table records that needed updating:

Create Table FilesToUpdate
(
File_PK bigint,
IsUpdated bit NOT NULL DEFAULT 0
)

Then we scheduled the following script to do the updating (for your own use, play with the batch sizing and scheduling for what works with your system).

/***  
Script to update and fix records.
***/
DECLARE @Rowcount INT = 1 -- 
    ,   @BatchSize INT = 100 -- how many rows will be updated on each iteration of the loop 
    ,   @BatchesToRun INT = 25 -- the max number of times the loop will iterate
    ,   @StartingRecord BIGINT = 1;

-- Get the highest File_PK not already fixed as a starting point.
Select @StartingRecord = MAX(File_PK) From FilesToUpdate where IsUpdated = 0

-- While there are still rows to update and we haven't hit our limit on iterations... 
WHILE (@Rowcount > 0 and @BatchesToRun > 0)   
BEGIN
    print Concat('StartingRecord (Start of Loop): ', @StartingRecord)
    UPDATE FileContent SET  NewFileContent = 'New value here'
    WHERE File_PK BETWEEN (@StartingRecord - @BatchSize + 1) AND @StartingRecord;

    -- @@Rowcount is the number of records affected by the last statement.  If this returns 0, the loop will stop because we've run out of things to update.
    SET @Rowcount = @@ROWCOUNT;
    print Concat('RowCount: ', @Rowcount)

    -- Record which PKs were updated so we know where to start next time around.
    UPDATE FilesToUpdate Set IsUpdated = 1 where File_PK BETWEEN (@StartingRecord - @BatchSize + 1) AND @StartingRecord;

    -- The loop will stop after @BatchSize*@BatchesToRun records are updated. 
    -- If there aren't that many records left to update, the @Rowcount checks will stop it. 
    SELECT @BatchesToRun = @BatchesToRun - 1
    print Concat('Batches Remaining: ',@BatchesToRun)

    -- Set the starting record for the next time through the loop.
    SELECT @StartingRecord -= @BatchSize
    print Concat('StartingRecord (End of Loop): ', @StartingRecord)
END

Upvotes: 0

HappyTown
HappyTown

Reputation: 6514

I took @pacreely's approach (see his answer on this question) of updating in batches and created a update...top variation. I added the (rowlock) hint tell SQL server to keep the locks at row level.

See update...top for details. Also note, that you cannot use order by when using top in update, insert, merge, delete statement so the referenced rows are not arranged in any order.

declare @BatchSize  int = 1000
declare @RowCount   int = @BatchSize

while @RowCount > 0
begin
    update top (@BatchSize) SomeTable with (rowlock)
    set NewColumn = OldColumn
    where 
        NewColumn <> OldColumn      or
        (
            NewColumn is null       and
            OldColumn is not null
        )
    select @RowCount = @@rowcount
end

Upvotes: 8

Related Questions