Reputation: 5292
I already spent some time trying to figure this out, but I am still somewhat stuck an I can't really find the solution online as I think I am missing the keywords.
I want to update an SQL tables in batches, meaning I have a few million entries and want to update index 0-999, 1000-1999 step by step to avoid a huge database lock.
This is what I found:
DECLARE @Rows INT,
@BatchSize INT;
SET @BatchSize = 2500;
SET @Rows = @BatchSize;
WHILE (@Rows = @BatchSize)
BEGIN
UPDATE TOP(@BatchSize) db1
SET db1.attr = db2.attr
FROM DB1 db1
LEFT JOIN DB2 db2
ON db1.attr2 = db2.attr2
SET @Rows = @@ROWCOUNT;
END;
I simplified my statement a little bit as you can see, but it should still be clear how I approached the whole problem.
However, this thing loops forever, and when looking at the output it changed much more rows than there are in the database.
I checked the same loop with a select statement inside later on and found out that it seems to simply select the first @BatchSize rows of the table on and on, even though I thought it would progress in the index with every iteration.
How can I change this so it actually does progress by @BatchSize indices every iteration instead of simply targeting the same rows everytime?
Upvotes: 2
Views: 5109
Reputation: 10277
You need some limiting factor to decide which rows are hit each loop. Generally you will use an id
field. There are lots of ways to approach it, but here is one way:
DECLARE @MinID int = 1;
DECLARE @MaxID int = 2500;
DECLARE @Rows int = 1;
DECLARE @Batchsize int = 2500;
WHILE (@Rows > 0) -- need greater than 0 to enter loop
BEGIN
UPDATE db1
SET db1.attr = db2.attr
FROM DB1 db1
LEFT JOIN DB2 db2 ON db1.attr2 = db2.attr2
WHERE db1.ID BETWEEN @MinID AND @MaxID
SET @Rows = @@ROWCOUNT
SET @MinID = @MinID + @Batchsize
SET @MaxID = @MaxID + @Batchsize
END
Replace db1.ID
with whatever field works best in your table schema.
Note, your approach would work if you had some kind of WHERE
clause on the update query that prevented the same rows from being returned.
Ex. UPDATE table SET id = 1 WHERE id = 2
won't pull the same rows in a second execution
Upvotes: 4
Reputation: 45096
You are just updating the same rows. Need a and <>
.
Left join? If you really want to assign null values then use a separate update.
DECLARE @Rows INT,
@BatchSize INT;
SET @BatchSize = 2500;
SET @Rows = @BatchSize;
WHILE (@Rows = @BatchSize)
BEGIN
UPDATE TOP(@BatchSize) db1
SET db1.attr = db2.attr
FROM DB1 db1
JOIN DB2 db2
ON db1.attr2 = db2.attr2
AND db1.attr <> db2.attr
SET @Rows = @@ROWCOUNT;
END;
And you can do this:
select 1
WHILE (@@ROWCOUNT > 0)
BEGIN
UPDATE TOP(2000) db1
SET db1.attr = db2.attr
FROM DB1 db1
JOIN DB2 db2
ON db1.attr2 = db2.attr2
AND db1.attr <> db2.attr
END;
Upvotes: 0
Reputation: 82474
One way to do it is using a cte with row_number:
DECLARE @BatchSize int = 2500,
@LastRowUpdated int = 0;
@Count int
SELECT @Count = COUNT(*) FROM db1;
;WITH CTE AS
(
SELECT attr,
attr2,
ROW_NUMBER() OVER(ORDER BY attr, atrr2) As RN
FROM db1
)
WHILE @LastRowUpdated < @Count
BEGIN
UPDATE c
SET attr = db2.atrr
FROM CTE c
LEFT JOIN DB2 ON c.attr2 = db2.attr2
WHERE c.RN > @LastRowUpdated
AND c.RN < (@LastRowUpdated +1) * @BatchSize
SELECT @LastRowUpdated += 1
END
This will update 2500 records each step of the loop.
Upvotes: 0