Josh
Josh

Reputation: 6322

Data Clensing in SQL Server

I've got a table in the following format:

Name    Address1    Address2    Address3
Joe     NULL        1 A Road    London
Bob     2 A Lane    NULL        London

I'd like to run an update to move all the values left where there are nulls so that I end up with:

Name    Address1    Address2    Address3
Joe     1 A Road    London      NULL
Bob     2 A Lane    London      NULL

Please help!

Thanks, Josh

Upvotes: 1

Views: 129

Answers (2)

David Waters
David Waters

Reputation: 12028

BEGIN TRANSACTION
update person set Address1 = Address2, Address2 = null 
  where Address1 is null and Address2 is not null;
update person set Address2 = Address3, Address3 = null
  where Address2 is null and Address3 is not null;
update person set Address1 = Address2, Address2 = null
  where Address1 is null and Address2 is not null;
select top 100 * from person
ROLLBACK TRANSACTION

This would get very ugly if you had lots of address fields, but for 3 is acceptable. Remember to test any script copy and pasted from the interwebs in a transaction that you rollback so you don't get your database wiped :)

EDIT: With further info (6 address fields): You could do like above only repeat 5 times to move so in worst case you where still done

Declare StoredProcedure OneRound
BEGIN
  update person set Address1 = Address2, Address2 = null 
    where Address1 is null and Address2 is not null;
  update person set Address2 = Address3, Address3 = null
    where Address2 is null and Address3 is not null;
  update person set Address3 = Address4, Address4 = null 
    where Address3 is null and Address4 is not null;
  update person set Address4 = Address5, Address5 = null
    where Address4 is null and Address5 is not null;
  update person set Address5 = Address6, Address6 = null 
    where Address5 is null and Address6 is not null;
END

Declare StoredProc FixAllAddresses
BEGIN
  call OneRound
  call OneRound
  call OneRound
  call OneRound
  call OneRound
END

You could also use a cusor (warning psydo code) I have not looked up the syntax for T-SQL cursors and it has been a while so I will get the details wrong, check the syntax in the online help.

Declare cursor @personCursor for select ID,Address1,Address2,... from person;
OPEN @personCursor
FETCH @personCursor into (@personID, @addr1, @addr2, @addr3...)
while(@@FETCH_STATUS)
BEGIN
  IF @addr1 is null
  BEGIN
    IF @addr2 is not null
    BEGIN
      @addr1 = @addr2
      @addr2 = null
    END
    ELSE IF @addr3 is not null
      @addr1 = @addr3
      @addr3 = null
    BEGIN
    -- Boaring, ugly code goes here for addr4,addr5,addr6
    END
  END

  IF @addr2 is null
    IF @addr3 is not null
    BEGIN
      @addr2 = @addr3
      @addr3 = null
    END
    ELSE IF @addr4 is not null
      @addr2 = @addr4
      @addr4 = null
    BEGIN
    -- Boaring, ugly code goes here for addr5, addr6
    END
  BEGIN

  -- repeat for addr3, addr4,

  if @addr5 is null
  BEGIN
    IF addr6 is not null
    BEGIN
      @addr5 = @addr6
      @addr6 = null
    END
  END

  END
  update person set address1 = @addr1, address2 = @addr2, ... 
    where PersonId = @personId
  FETCH @personCursor into (@personID, @addr1, @addr2, @addr3...)  
END

The Stored proc begin called 5 times makes for less code and may be less error prone, the cursor only iterates through your people once, but does not filter. I suspect the stored proc solution below will be faster but it will depend on your data.

Upvotes: 3

Martin Smith
Martin Smith

Reputation: 453648

David's solution is the most efficient. One that might be more easily extend-able for greater numbers of columns is.

;WITH cte
     AS (SELECT *,
                 MAX(CASE WHEN RN=1 THEN value END) OVER (PARTITION BY ContactId) AS new_Address1,
                 MAX(CASE WHEN RN=2 THEN value END) OVER (PARTITION BY ContactId) AS new_Address2,
                 MAX(CASE WHEN RN=3 THEN value END) OVER (PARTITION BY ContactId) AS new_Address3
         FROM   #Addresses
                OUTER APPLY (SELECT ROW_NUMBER() OVER (ORDER BY CASE WHEN value IS NULL THEN 1 ELSE 0 END, idx) AS
                                   RN, idx, value
                             FROM   (VALUES(1,Address1),
                                           (2,Address2),
                                           (3,Address3)) t (idx, value)) d)
UPDATE cte
SET    Address1 = new_Address1,
       Address2 = new_Address2,
       Address3 = new_Address3  

Upvotes: 2

Related Questions