Reputation: 6478
I have a huge table with millions and millions of rows.
It has a GUID RAW(16 Byte) column. For some reason/oversight this table never had a primary key constraint and due to a bug in the code had duplicates inserted into the GUID column (as well as some null values)
What I'd like to do is:
The GUID values don't have to be preserved, they just all have to be unique. But the table in question has about ~300 million records so I'm looking for a way to do it that won't result in too much downtime.
Thanks
Upvotes: 0
Views: 1401
Reputation: 23588
I'd use a merge statement, something along the lines of:
merge into your_table tgt
using (select guid,
row_id
from (select guid,
rowid row_id,
row_number() over (partition by guid order by rowid) rn
from your_table)
where rn != 1) src
on (tgt.rowid = src.rowid)
when matched then
update set guid = null;
Add a unique index, then do an update:
update your_table
set guid = sys_guid()
where guid is null;
Then alter your guid column to be not nullable and finally add the primary key constraint.
Alternatively, you could do the update of the null and duplicate guid rows to sys_guid() in the merge, but you run the risk of a duplicate guid being created. As it is, the final update may still fail, since sys_guid() isn't necessarily guaranteed to generate a unique value.
Personally, I'd scrap the guid (although I realise this isn't always possible) and go with a sequence; that's much easier to guarantee uniqueness!
Upvotes: 1