Venkataraman R
Venkataraman R

Reputation: 12959

Updating billion rows table

We are having a table with billion rows. We just want to UPDATE two columns to NULL for all rows. We were trying to do this in batches. But, we are concerned about the performance issues. We are having clustered columnstore index on this table. Does creating composite non-clustered index on these three columns would help ? or can we re-load the table to new table & swap the tables ?

Any inputs will be very helpful.

DECLARE @notNULLRecordsCount INT; 

SET @notNULLRecordsCount = 1;

WHILE @notNULLRecordsCount > 0 
    BEGIN 
    BEGIN TRANSACTION;

    UPDATE  TOP (100000)
    dbo.BillionRowsTable
    SET     Column1 = NULL,
            Column2 = NULL,
            Column3 = NULL     
    WHERE   Column1 IS NOT NULL OR Column2 IS NOT NULL OR Column3 IS NOT NULL;


SET @notNULLRecordsCount = @@ROWCOUNT;
COMMIT TRANSACTION;
END 

Upvotes: 1

Views: 2339

Answers (2)

paparazzo
paparazzo

Reputation: 45096

My experience is that indexes actually slow this down as the indexes must be maintained.

Leaving a transaction open is not good. Wrapping all these in a transaction is not good. If you are going to wrap them all in a transaction then no purpose to breaking them up.

If you could key on just one column (no or) it would be faster.

You can shorten this

select 1;
WHILE @@ROWCOUNT > 0 
BEGIN 
    UPDATE  TOP (100000)
    dbo.BillionRowsTable
    SET     Column1 = NULL,
            Column2 = NULL,
            Column3 = NULL     
    WHERE   Column1 IS NOT NULL 
         OR Column2 IS NOT NULL 
         OR Column3 IS NOT NULL
END

To get rid of the or

select 1;
WHILE @@ROWCOUNT > 0 
BEGIN 
    UPDATE  TOP (100000)
    dbo.BillionRowsTable
    SET     Column1 = NULL,
            Column2 = NULL,
            Column3 = NULL     
    WHERE   Column1 IS NOT NULL
END

select 1;
WHILE @@ROWCOUNT > 0 
BEGIN 
    UPDATE  TOP (100000)
    dbo.BillionRowsTable
    SET     Column2 = NULL,
            Column3 = NULL     
    WHERE   Column2 IS NOT NULL
END

select 1;
WHILE @@ROWCOUNT > 0 
BEGIN 
    UPDATE  TOP (100000)
    dbo.BillionRowsTable
    SET     Column3 = NULL     
    WHERE   Column3 IS NOT NULL
END

Upvotes: 4

Venkataraman R
Venkataraman R

Reputation: 12959

Thanks for the suggestion by @Paparazzo, we have decided to do this as a two staged UPDATE, using single column. We do once using Column1 as NULL and another one using Column1 as NOT NULL, so that we cover all the rows.

select 1;
WHILE @@ROWCOUNT > 0 
BEGIN 
BEGIN TRANSACTION;
    UPDATE  TOP (100000)
    dbo.BillionRowsTable
    SET     Column2 = NULL,
            Column3 = NULL     
    WHERE   Column1 IS NULL;
COMMIT TRANSACTION;
END

select 1;
WHILE @@ROWCOUNT > 0 
BEGIN 
BEGIN TRANSACTION;
    UPDATE  TOP (100000)
    dbo.BillionRowsTable
    SET     Column1= NULL,
            Column2 = NULL,
            Column3 = NULL     
    WHERE   Column1 IS NOT NULL;
COMMIT TRANSACTION;
END

Upvotes: 0

Related Questions