Reputation: 5692
Overview
I'm converting a UnitsOrdered
column from an int
to a floating point. My strategy was to create a new column with the new datatype, and populate it with the old column's data. This will be performed on a table with hundreds of millions of records. To keep the transaction smaller, I was hoping to chunk the process, looping over some fixed number of rows.
Issue
The issue I've found is that the update get progressively slower by a lot. I was hoping that there was a simple explanation (and consequently a fix). My hunch would be that updating rows 1-5 would be the same speed as 6-10 which would be the same as 1000001-1000005, but that doesn't appear to be the case.
Things I've tried:
option(recompile)
and with(tablock)
Notes
UnitsOrdered
OrderItemID
Happy to provide any more information. The schema of the table is super vanilla (by design)...it's roughly:
Create table dbo.OrderItems
(
OrderItemID int primary key identity,
UnitsOrdered int NOT NULL,
UnitsOrdered1 real NOT NULL
)
SQL:
print 'Populating column data...';
GO
set nocount on;
declare
@totalRows int = 0,
@affectedRows int = 0,
@rowsFetched int = 1000000,
@rowsProcessed int = 0,
@statusMessage nvarchar(100),
@time datetime,
@refreshEvery int = 10000000
select @totalRows = Count(*)
from dbo.OrderItems
while(1 = 1) begin
set @time = getUtcDate();
;with cte as(
select *
from dbo.OrderItems with(tablock)
order by
OrderItemID
offset @rowsProcessed ROWS
fetch next @rowsFetched ROWS ONLY
)
update cte
set
UnitsOrdered1 = UnitsOrdered
option(RECOMPILE)
set @affectedRows = @@ROWCOUNT;
if(@affectedRows = 0) begin
break;
end
--increment processed rows
set @rowsProcessed = @rowsProcessed + @affectedRows;
--set status message (%% is escaped)
set @statusMessage = Concat( '->',
@affectedRows,
' rows updated in ',
datediff(s,@time,getUtcDate()),
' seconds. ',
(Cast(@rowsProcessed as float) / Cast(NULLIF(@totalRows,0) as float))*100,
'%% complete...');
--we use raiseerror so we can output the message instead of buffering it.
raiserror(@statusMessage,0,1) with nowait
end
GO
Progress:
Populating column data...
->1000000 rows updated in 1 seconds. 2.61037% complete...
->1000000 rows updated in 3 seconds. 5.22074% complete...
->1000000 rows updated in 2 seconds. 7.83111% complete...
->1000000 rows updated in 3 seconds. 10.4415% complete...
->1000000 rows updated in 3 seconds. 13.0519% complete...
->1000000 rows updated in 4 seconds. 15.6622% complete...
->1000000 rows updated in 4 seconds. 18.2726% complete...
->1000000 rows updated in 4 seconds. 20.883% complete...
->1000000 rows updated in 4 seconds. 23.4933% complete...
->1000000 rows updated in 5 seconds. 26.1037% complete...
->1000000 rows updated in 9 seconds. 28.7141% complete...
->1000000 rows updated in 5 seconds. 31.3245% complete...
->1000000 rows updated in 6 seconds. 33.9348% complete...
->1000000 rows updated in 6 seconds. 36.5452% complete...
->1000000 rows updated in 6 seconds. 39.1556% complete...
->1000000 rows updated in 7 seconds. 41.7659% complete...
(etc. for several million records)
->1000000 rows updated in 71 seconds. 84.3763% complete...
->1000000 rows updated in 74 seconds. 86.9867% complete...
->1000000 rows updated in 87 seconds. 89.597% complete...
->1000000 rows updated in 92 seconds. 92.2074% complete...
Thanks SO!
Upvotes: 0
Views: 320
Reputation: 46213
The progressive slowness is expected. This is an inherent problem with any row-number pagination technique. You will likely get better and more consistent performance with batch updates by updating by range of OrderItemID values so that an index in the column can be used to efficiently locate and touch only those rows that need to be updated.
Upvotes: 2