Reputation: 4474

Fastest way to modify each row in a table

What's the recommended way of updating a relatively large table (~70 million rows), in order to replace a foreign key column with an id of a different table (indirectly linked by the current key)?

Let's say I have three tables:

Person
  Id long,
  Group_id long   --> foreign key to Group table

Group
  Id long
  Device_id long  --> foreign key to Device table

Device
  Id long

I would like to update the Person table to have a direct foreign key to the Device table, i.e.:

Person
  Id long,
  Device_Id long  --> foreign key to Device table

Device
  Id long

The query would look something like this:

-- replace Group_id with Device_id
update p from Person p
  inner join Group g
  on g.Id = p.Group_id
set p.Group_id = g.Device_id

I would first drop the FK constraint, and then rename the column afterwards.

Will this work?
Is there a better way?
Can I speed it up? (while this query is running, everything else will be offline, server is UPS backed-up, so I'd like to skip any transactional updates)

Upvotes: 0

Answers (3)

UnhandledExcepSean

Reputation: 12804

Drop indexes on the table you are updating and recreate after the update is complete.
Drop constraints on the table you are updating and recreate appropriately (you are changing the reference after all) after the update is complete.
Turn off triggers on the table you are updating and enable after the update is complete.
You might want to consider running batches. I personally would create a loop and batch update 10k rows at a time. This seemed to cause the fewest problems on my hardware (running out of disk space, etc). You could order the update and track the PK so you know where you are at. Or create a bit column that is set when a particular record is updated; this method might make it easier overall as you won't need to track the PK at all.

An example of such a loop might look like this:

select top 1 * from table

DECLARE @MinPK BIGINT
DECLARE @MaxPK BIGINT
SET @MinPK=0
SET @MaxPK=0

WHILE @@ROWCOUNT>0
BEGIN
    SELECT
        @MaxPK=MAX(a.PK)
    FROM (
        SELECT TOP 3
            PK
        FROM Table
        WHERE PK>@MinPK
        ORDER BY PK ASC
    ) a

    --Change this to an update
    SELECT
        PK
    FROM Table
    WHERE PK>@MinPK
    AND PK<=@MaxPK

    SET @MinPK=@MaxPK
END

Upvotes: 1

Bohemian

Reputation: 424983

Your idea won't "work", unless there is only one device per group (which would be ridiculous, so I assume not).

The problem is that you would have to cram many device_id values into one column in the person table - that's why you've got a group table in the first place.

Upvotes: 0

RichardTheKiwi

Reputation: 107706

It would work if you wrote the UPDATE properly (assuming this is SQL Server)

update p
set p.Group_id = g.Device_id
from Person p
inner join Group g on g.Id = p.Group_id

Apart from that, it's a really smart move to re-use, then rename the column*. Can't think of any smart way to make this any faster, unless you wish to use a WHILE loop and person.Id markers to break up the updates into batches.

* - ALTER TABLE DROP COLUMN DOES NOT RECLAIM THE SPACE THE COLUMN TOOK

Upvotes: 1

Fastest way to modify each row in a table

Answers (3)

Related Questions