Sossenbinder
Sossenbinder

Reputation: 5292

How do I update a SQL table in batches?

I already spent some time trying to figure this out, but I am still somewhat stuck an I can't really find the solution online as I think I am missing the keywords.

I want to update an SQL tables in batches, meaning I have a few million entries and want to update index 0-999, 1000-1999 step by step to avoid a huge database lock.

This is what I found:

DECLARE @Rows INT,
        @BatchSize INT;

SET @BatchSize = 2500;
SET @Rows = @BatchSize;

WHILE (@Rows = @BatchSize)
BEGIN
    UPDATE TOP(@BatchSize) db1
    SET db1.attr = db2.attr
    FROM DB1 db1
    LEFT JOIN DB2 db2
    ON db1.attr2 = db2.attr2

    SET @Rows = @@ROWCOUNT;
END;

I simplified my statement a little bit as you can see, but it should still be clear how I approached the whole problem.

However, this thing loops forever, and when looking at the output it changed much more rows than there are in the database.

I checked the same loop with a select statement inside later on and found out that it seems to simply select the first @BatchSize rows of the table on and on, even though I thought it would progress in the index with every iteration.

How can I change this so it actually does progress by @BatchSize indices every iteration instead of simply targeting the same rows everytime?

Upvotes: 2

Views: 5109

Answers (3)

Aaron Dietz
Aaron Dietz

Reputation: 10277

You need some limiting factor to decide which rows are hit each loop. Generally you will use an id field. There are lots of ways to approach it, but here is one way:

DECLARE @MinID int = 1;
DECLARE @MaxID int = 2500;
DECLARE @Rows int = 1;
DECLARE @Batchsize int = 2500;

WHILE (@Rows > 0)  -- need greater than 0 to enter loop
BEGIN
       UPDATE db1
       SET db1.attr = db2.attr
       FROM DB1 db1
       LEFT JOIN DB2 db2 ON db1.attr2 = db2.attr2
       WHERE db1.ID BETWEEN @MinID AND @MaxID

SET @Rows = @@ROWCOUNT
SET @MinID = @MinID + @Batchsize
SET @MaxID = @MaxID + @Batchsize

END

Replace db1.ID with whatever field works best in your table schema.

Note, your approach would work if you had some kind of WHERE clause on the update query that prevented the same rows from being returned.

Ex. UPDATE table SET id = 1 WHERE id = 2 won't pull the same rows in a second execution

Upvotes: 4

paparazzo
paparazzo

Reputation: 45096

You are just updating the same rows. Need a and <>.

Left join? If you really want to assign null values then use a separate update.

DECLARE @Rows INT,
        @BatchSize INT;

SET @BatchSize = 2500;
SET @Rows = @BatchSize;

WHILE (@Rows = @BatchSize)
BEGIN
    UPDATE TOP(@BatchSize) db1
    SET db1.attr = db2.attr
    FROM DB1 db1
    JOIN DB2 db2
      ON db1.attr2 = db2.attr2 
     AND db1.attr <> db2.attr

    SET @Rows = @@ROWCOUNT;
END;

And you can do this:

select 1
WHILE (@@ROWCOUNT > 0)
BEGIN
    UPDATE TOP(2000) db1
    SET db1.attr = db2.attr
    FROM DB1 db1
    JOIN DB2 db2
      ON db1.attr2 = db2.attr2 
     AND db1.attr <> db2.attr
END;

Upvotes: 0

Zohar Peled
Zohar Peled

Reputation: 82474

One way to do it is using a cte with row_number:

DECLARE @BatchSize int = 2500,
        @LastRowUpdated int = 0;
        @Count int


SELECT @Count = COUNT(*) FROM db1;


;WITH CTE AS
(
    SELECT  attr, 
            attr2,
            ROW_NUMBER() OVER(ORDER BY attr, atrr2) As RN
    FROM db1
)

WHILE @LastRowUpdated < @Count
BEGIN

    UPDATE c
    SET attr = db2.atrr
    FROM CTE c
    LEFT JOIN DB2 ON c.attr2 = db2.attr2
    WHERE c.RN > @LastRowUpdated
    AND c.RN < (@LastRowUpdated +1) * @BatchSize

    SELECT @LastRowUpdated += 1

END

This will update 2500 records each step of the loop.

Upvotes: 0

Related Questions