Sam
Sam

Reputation: 652

Only update values that need updating

I have written the following script;

----------------
-- GLOBAL FAMILY DEDUPLICATION
----------------
--Global Family Deduplication of parents
update mstr
    SET 
        IsGlobalFamilyUnique = case 
            when (rn > 1 and IsGlobalFamilyUnique is null) then 0 
            when (rn = 1 and IsGlobalFamilyUnique is null) then 1
            else IsGlobalFamilyUnique end,
        GlobalFamilyDupID = [NewGlobalFamilyDupID],
        GroupIdentifier = ControlNumber
    from(
    select
        ControlNumber,
        MD5hash,
        IsglobalFamilyUnique,
        GlobalFamilyDupID,
        TopLvlGuid,
        GroupIdentifier,
        first_value(ControlNumber) over (partition by [MD5Hash] order by ID) [NewGlobalFamilyDupID],
        row_number() over (partition by [MD5Hash] order by ID asc) [RN]
    from ( --we only need non duplicated parents, and the unique set for deduplication
        select * from dbo.tblMaster where ischild = 0 and (IsGlobalFamilyUnique = 1 or IsGlobalFamilyUnique is null)
    )toDedupe
)mstr

Test data:

 declare @testdata as table (
    controlnumber nvarchar(100),
    MD5Hash nvarchar(100),
    IsGlobalFamilyUnique bit,
    GlobalFamilyDupId nvarchar(50),
    TopLvlGuid nvarchar(50),
    IsChild bit
)

insert into @testdata values
('BUL00000001','a0e0e90fe689ad07b65c461097558837',NULL,NULL,'000049d2-4edd-4ac9-8a14-ceb08e7b9883',0),
('BUL00000002','7072a5bcb002868823b8637b7f6b70fb',NULL,NULL,'000049d2-4edd-4ac9-8a14-ceb08e7b9883',1),
('BUL00000003','3895486d531c251126244cd63861fa3a',NULL,NULL,'000049d2-4edd-4ac9-8a14-ceb08e7b9883',1),
('BUL00000004','46a6eb25985931c9f15dafe3c4fe88ce',NULL,NULL,'00005819-661b-47ed-8044-77872b39487e',0),
('BUL00000005','5cd287d2ed56b868c11e1a0ea37e36bb',NULL,NULL,'000302ec-6bf7-433c-bdf3-be3aa0420468',0),
('BUL00000006','4a1335a5e09a1e6ba73b6004b70742b0',NULL,NULL,'0003df89-d7f2-4824-9364-a40d4ca6dcae',0),
('BUL00000007','feeb4fd491b8c6584f740154278b8494',NULL,NULL,'0005cd66-db18-4f63-a61d-c901033a69da',0),
('BUL00000008','689ab2a7527b0db4c3717939da2db52f',NULL,NULL,'0005cd66-db18-4f63-a61d-c901033a69da',1),
('BUL00000009','9cc73a89f66df1140622af27aa4c03fd',NULL,NULL,'0005cd66-db18-4f63-a61d-c901033a69da',1),
('BUL00000010','36ebcbd2fe90b724c0cd71012722266d',NULL,NULL,'0005cd66-db18-4f63-a61d-c901033a69da',1)

However I don't like that every time the script is run, it will be setting the IsGlobalFamilyUnique column, even if it has been previously set. It seems like a waste of resources and potentially dangerous thing to do.

I want the script to only set the IsGlobalFamilyUnqiue field for data that has not yet been deduplicated, i.e. has a IsGlobalFamilyUnique column is set to NULL. It should only de-duplicate where IsChild is 0 and against data that is unique, i.e. IsGlobalFamilyUnique is 1.

How can I re-write this code to make it more efficient and remove the risk to potentially overwrite already set data?

Upvotes: 1

Views: 33

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270181

Assuming the other values are being set, there is essentially no overhead at all. The overhead for an update tends to be:

  • Locking resources
  • Logging the action
  • Dealing with dirty writes

The rest all occurs in memory. In fact, SQL Server probably constructs the entire record in memory, copies it to a page in memory and then writes out the entire page (actually, I think it is a group of four pages). Whether one column or three columns change, there is almost no difference.

Now, if all three columns might remain the same, then there is an optimization. Just use a where clause to limit the updates to records that have changed.

Upvotes: 2

Related Questions