Reputation: 652
I have written the following script;
----------------
-- GLOBAL FAMILY DEDUPLICATION
----------------
--Global Family Deduplication of parents
update mstr
SET
IsGlobalFamilyUnique = case
when (rn > 1 and IsGlobalFamilyUnique is null) then 0
when (rn = 1 and IsGlobalFamilyUnique is null) then 1
else IsGlobalFamilyUnique end,
GlobalFamilyDupID = [NewGlobalFamilyDupID],
GroupIdentifier = ControlNumber
from(
select
ControlNumber,
MD5hash,
IsglobalFamilyUnique,
GlobalFamilyDupID,
TopLvlGuid,
GroupIdentifier,
first_value(ControlNumber) over (partition by [MD5Hash] order by ID) [NewGlobalFamilyDupID],
row_number() over (partition by [MD5Hash] order by ID asc) [RN]
from ( --we only need non duplicated parents, and the unique set for deduplication
select * from dbo.tblMaster where ischild = 0 and (IsGlobalFamilyUnique = 1 or IsGlobalFamilyUnique is null)
)toDedupe
)mstr
Test data:
declare @testdata as table (
controlnumber nvarchar(100),
MD5Hash nvarchar(100),
IsGlobalFamilyUnique bit,
GlobalFamilyDupId nvarchar(50),
TopLvlGuid nvarchar(50),
IsChild bit
)
insert into @testdata values
('BUL00000001','a0e0e90fe689ad07b65c461097558837',NULL,NULL,'000049d2-4edd-4ac9-8a14-ceb08e7b9883',0),
('BUL00000002','7072a5bcb002868823b8637b7f6b70fb',NULL,NULL,'000049d2-4edd-4ac9-8a14-ceb08e7b9883',1),
('BUL00000003','3895486d531c251126244cd63861fa3a',NULL,NULL,'000049d2-4edd-4ac9-8a14-ceb08e7b9883',1),
('BUL00000004','46a6eb25985931c9f15dafe3c4fe88ce',NULL,NULL,'00005819-661b-47ed-8044-77872b39487e',0),
('BUL00000005','5cd287d2ed56b868c11e1a0ea37e36bb',NULL,NULL,'000302ec-6bf7-433c-bdf3-be3aa0420468',0),
('BUL00000006','4a1335a5e09a1e6ba73b6004b70742b0',NULL,NULL,'0003df89-d7f2-4824-9364-a40d4ca6dcae',0),
('BUL00000007','feeb4fd491b8c6584f740154278b8494',NULL,NULL,'0005cd66-db18-4f63-a61d-c901033a69da',0),
('BUL00000008','689ab2a7527b0db4c3717939da2db52f',NULL,NULL,'0005cd66-db18-4f63-a61d-c901033a69da',1),
('BUL00000009','9cc73a89f66df1140622af27aa4c03fd',NULL,NULL,'0005cd66-db18-4f63-a61d-c901033a69da',1),
('BUL00000010','36ebcbd2fe90b724c0cd71012722266d',NULL,NULL,'0005cd66-db18-4f63-a61d-c901033a69da',1)
However I don't like that every time the script is run, it will be setting the IsGlobalFamilyUnique
column, even if it has been previously set. It seems like a waste of resources and potentially dangerous thing to do.
I want the script to only set the IsGlobalFamilyUnqiue
field for data that has not yet been deduplicated, i.e. has a IsGlobalFamilyUnique
column is set to NULL
. It should only de-duplicate where IsChild
is 0
and against data that is unique, i.e. IsGlobalFamilyUnique
is 1
.
How can I re-write this code to make it more efficient and remove the risk to potentially overwrite already set data?
Upvotes: 1
Views: 33
Reputation: 1270181
Assuming the other values are being set, there is essentially no overhead at all. The overhead for an update tends to be:
The rest all occurs in memory. In fact, SQL Server probably constructs the entire record in memory, copies it to a page in memory and then writes out the entire page (actually, I think it is a group of four pages). Whether one column or three columns change, there is almost no difference.
Now, if all three columns might remain the same, then there is an optimization. Just use a where
clause to limit the updates to records that have changed.
Upvotes: 2