Reputation: 31

How to optimize this t-sql script code by avoiding loop?

I use following sql query to update MyTable. the code take between 5 to 15 min. to update MyTabel as long as ROWS <= 100000000 but when Rows > 100000000 it take exponential time to update MYTable. How can I change this code to use set-base instead of while loop?

DECLARE @startTime DATETIME
DECLARE @batchSize INT
DECLARE @iterationCount INT
DECLARE @i INT
DECLARE @from INT
DECLARE @to INT

SET @batchSize = 10000
SET @i = 0

SELECT @iterationCount = COUNT(*) / @batchSize
FROM MyTable
WHERE LitraID = 8175
    AND id BETWEEN 100000000 AND 300000000

WHILE @i <= @iterationCount BEGIN

    BEGIN TRANSACTION T

    SET @startTime = GETDATE()
    SET @from = @i * @batchSize
    SET @to = (@i + 1) * @batchSize - 1

    ;WITH data
    AS (
        SELECT DoorsReleased, ROW_NUMBER() OVER (ORDER BY id) AS Row
        FROM MyTable
        WHERE LitraID = 8175
            AND id BETWEEN 100000000 AND 300000000
    )
    UPDATE data
    SET DoorsReleased = ~DoorsReleased
    WHERE row BETWEEN @from AND @to

    SET @i = @i + 1

    COMMIT TRANSACTION T

END

Upvotes: 1

Answers (2)

paparazzo

Reputation: 45096

This will eliminate the loop

UPDATE MyTable
   set DoorsReleased = ~DoorsReleased
 WHERE LitraID = 8175
   AND id BETWEEN 100000000 AND 300000000 
   AND DoorsReleased is not null -- if DoorsReleased is nullable
-- AND DoorsReleased <> ~DoorsReleased</strike>

if you are set on looping
below will NOT work
I thought ~ was part of the column name but it is a not operator

select 1;
WHILE (@@ROWCOUNT > 0)
BEGIN
    UPDATE top (100000) MyTable
       set DoorsReleased = ~DoorsReleased
     WHERE LitraID = 8175
       AND id BETWEEN 100000000 AND 300000000 
       AND (       DoorsReleased <> ~DoorsReleased 
             or (  DoorsReleased is null and ~DoorsReleased is not null )
           )
END

Inside a transaction I don't think looping would have value as the transaction log cannot clear. And a batch size of 10,000 is small.\

as stated in a comment if you want to loop then try using id as row_number() all those loops is expensive

you might be able to use OFFSET

Upvotes: 1

Andy Nichols

Reputation: 3002

One of your issues is that your select statement in the loop fetches all records for LitraID = 8175, sets row numbers, then filters in the update statement. This happens on every iteration.

One way round this would be to get all ids for the update before entering the loop and storing them in a temporary table. Then you can write a similar query to the one you have, but joining to this table of ids.

However, there is an even easier way if you know approximately how many records have LitraID = 8175 and if they are spread throughout the table, not bunched together with similar ids.

DECLARE @batchSize INT
DECLARE @minId INT
DECLARE @maxId INT

SET @batchSize = 10000 --adjust according to how frequently LitraID = 8175, larger numbers if infrequent
SET @minId = 100000000

WHILE @minId <= 300000000 BEGIN

    SET @maxId = @minId + @batchSize - 1
    IF @maxId > 300000000 BEGIN
        SET @maxId = 300000000
    END

    BEGIN TRANSACTION T

        UPDATE MyTable
        SET DoorsReleased = ~DoorsReleased
        WHERE id BETWEEN @minId AND @maxId

    COMMIT TRANSACTION T

    SET @minId = @maxId + 1
END

This will use the value of id to control the loop, meaning you don't need the extra step to calculate @iterationCount. It uses small batches so that the table isn't locked for long periods. It doesn't have any unnecessary SELECT statements and the WHERE clause in the update is efficient assuming id has an index.

It won't have exactly the same number of records updated in every transaction, but there's no reason it needs to.

Upvotes: 1

How to optimize this t-sql script code by avoiding loop?

Answers (2)

Related Questions