BlackjacketMack
BlackjacketMack

Reputation: 5692

Looping over update statement gets progressively slower with SQL Server

Overview

I'm converting a UnitsOrdered column from an int to a floating point. My strategy was to create a new column with the new datatype, and populate it with the old column's data. This will be performed on a table with hundreds of millions of records. To keep the transaction smaller, I was hoping to chunk the process, looping over some fixed number of rows.

Issue

The issue I've found is that the update get progressively slower by a lot. I was hoping that there was a simple explanation (and consequently a fix). My hunch would be that updating rows 1-5 would be the same speed as 6-10 which would be the same as 1000001-1000005, but that doesn't appear to be the case.

Things I've tried:

Notes

SQL:

print 'Populating column data...';
GO
set nocount on;
declare 
    @totalRows int = 0,
    @affectedRows int = 0,
    @rowsFetched int = 1000000,
    @rowsProcessed int = 0,
    @statusMessage nvarchar(100),
    @time datetime,
    @refreshEvery int = 10000000

select @totalRows = Count(*)
from dbo.OrderItems

while(1 = 1) begin

    set @time = getUtcDate();

    ;with cte as(
        select *
        from dbo.OrderItems with(tablock)
        order by 
            OrderItemID

        offset @rowsProcessed ROWS
        fetch next @rowsFetched ROWS ONLY
    )

    update cte
    set
        UnitsOrdered1 = UnitsOrdered
        option(RECOMPILE) 


    set @affectedRows = @@ROWCOUNT;

    if(@affectedRows = 0) begin
        break;
    end

    --increment processed rows
    set @rowsProcessed = @rowsProcessed + @affectedRows;

    --set status message (%% is escaped)
    set @statusMessage = Concat(    '->',
                                    @affectedRows,
                                    ' rows updated in ',
                                    datediff(s,@time,getUtcDate()),
                                    ' seconds. ',
                                    (Cast(@rowsProcessed as float) / Cast(NULLIF(@totalRows,0) as float))*100,
                                    '%% complete...');

    --we use raiseerror so we can output the message instead of buffering it.
    raiserror(@statusMessage,0,1) with nowait

end
GO

Progress:

    Populating column data...
    ->1000000 rows updated in 1 seconds. 2.61037% complete...
    ->1000000 rows updated in 3 seconds. 5.22074% complete...
    ->1000000 rows updated in 2 seconds. 7.83111% complete...
    ->1000000 rows updated in 3 seconds. 10.4415% complete...
    ->1000000 rows updated in 3 seconds. 13.0519% complete...
    ->1000000 rows updated in 4 seconds. 15.6622% complete...
    ->1000000 rows updated in 4 seconds. 18.2726% complete...
    ->1000000 rows updated in 4 seconds. 20.883% complete...
    ->1000000 rows updated in 4 seconds. 23.4933% complete...
    ->1000000 rows updated in 5 seconds. 26.1037% complete...
    ->1000000 rows updated in 9 seconds. 28.7141% complete...
    ->1000000 rows updated in 5 seconds. 31.3245% complete...
    ->1000000 rows updated in 6 seconds. 33.9348% complete...
    ->1000000 rows updated in 6 seconds. 36.5452% complete...
    ->1000000 rows updated in 6 seconds. 39.1556% complete...
    ->1000000 rows updated in 7 seconds. 41.7659% complete...
(etc. for several million records)

    ->1000000 rows updated in 71 seconds. 84.3763% complete...
    ->1000000 rows updated in 74 seconds. 86.9867% complete...
    ->1000000 rows updated in 87 seconds. 89.597% complete...
    ->1000000 rows updated in 92 seconds. 92.2074% complete...

Thanks SO!

Upvotes: 0

Views: 320

Answers (1)

Dan Guzman
Dan Guzman

Reputation: 46213

The progressive slowness is expected. This is an inherent problem with any row-number pagination technique. You will likely get better and more consistent performance with batch updates by updating by range of OrderItemID values so that an index in the column can be used to efficiently locate and touch only those rows that need to be updated.

Upvotes: 2

Related Questions