PMC
PMC

Reputation: 4766

Split address column into multiple on large table

I have a table with 500k rows where the address is in one field, delimited by Char(13)+Char(10). I have added 5 fields to the table in the hope of splitting this up.

Found online this split function that seems to perform well as I cannot use parsename due to there being 5 parts and also that the . may be in the field.

This is a table-valued function so I would have to loop the rows and update the record, previously I would have used a cursor or sql while or possibly even c# to do this but I feel their must be a cte or set based answer to do this.

Upvotes: 0

Views: 1038

Answers (2)

James Culshaw
James Culshaw

Reputation: 1057

You have couple of options:

You can create a temp table and then parse the address into the temp table and then update the original table by joining it to the temp table.

or

You can write your own T-SQL functions and use those functions in your update statement function like follows:

UPDATE myTable
   SET address1 = myGetAddress1Function(address),
       address2 = myGetAddress2Function(address)....

Upvotes: 3

Aaron Bertrand
Aaron Bertrand

Reputation: 280490

So given some source data:

CREATE TABLE dbo.Addresses
(
  AddressID INT IDENTITY(1,1),
  [Address] VARCHAR(255),
  Address1  VARCHAR(255),
  Address2  VARCHAR(255),
  Address3  VARCHAR(255),
  Address4  VARCHAR(255),
  Address5  VARCHAR(255)
);

INSERT dbo.Addresses([Address])
SELECT 'foo
bar'
UNION ALL SELECT 'add1
add2
add3
add4
add5';

Let's create a function that returns the address parts in a sequence:

CREATE FUNCTION dbo.SplitAddressOrdered
(
    @AddressID  INT,
    @List       VARCHAR(MAX),
    @Delimiter  VARCHAR(32)
)
RETURNS TABLE
AS
    RETURN 
    (
      SELECT  
          AddressID = @AddressID, 
          rn = ROW_NUMBER() OVER (ORDER BY Number), 
          AddressItem = Item 
        FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(@List, Number, 
          CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)))
        FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
          FROM sys.all_objects) AS n(Number)
        WHERE Number <= CONVERT(INT, LEN(@List))
        AND SUBSTRING(@Delimiter + @List, Number, LEN(@Delimiter)) = @Delimiter
      ) AS y
    );
GO

Now you can do this (you will have to run the query 5 times):

DECLARE 
  @i INT = 1, 
  @sql NVARCHAR(MAX),
  @src NVARCHAR(MAX) = N';WITH x AS 
    (
      SELECT a.*, Original = s.AddressID, s.rn, s.AddressItem
      FROM dbo.Addresses AS a
      CROSS APPLY dbo.SplitAddressOrdered(a.AddressID, a.Address, 
        CHAR(13) + CHAR(10)) AS s WHERE rn = @i
    )';
WHILE @i <= 5
BEGIN
   SET @sql = @src + N'UPDATE x SET Address' + RTRIM(@i)
     + ' = CASE WHEN AddressID = Original AND rn = ' 
     + RTRIM(@i) + ' THEN AddressItem END;';

   EXEC sp_executesql @sql, N'@i INT', @i;

   SET @i += 1;
END

Then you can drop the Address column:

ALTER TABLE dbo.Addresses DROP COLUMN [Address];

Then the table has:

AddressID  Address1  Address2  Address3  Address4  Address5
---------  --------  --------  --------  --------  --------
1          foo       bar       NULL      NULL      NULL
2          add1      add2      add3      add4      add5

I'm sure someone more clever than I will show how to utilize that function without having to loop.

I could also envision a slight change to the function that would allow you to simply pull out a certain element... hold please...

EDIT

Here's a scalar function that is more expensive on its own but allows you to make one pass of the table instead of 5:

CREATE FUNCTION dbo.ElementFromOrderedList
(
    @List       VARCHAR(MAX),
    @Delimiter  VARCHAR(32),
    @Index      SMALLINT
)
RETURNS VARCHAR(255)
AS
BEGIN
    RETURN 
    (
      SELECT Item 
        FROM (SELECT rn = ROW_NUMBER() OVER (ORDER BY Number),
          Item = LTRIM(RTRIM(SUBSTRING(@List, Number, 
          CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)))
        FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
          FROM sys.all_objects) AS n(Number)
        WHERE Number <= CONVERT(INT, LEN(@List))
        AND SUBSTRING(@Delimiter + @List, Number, LEN(@Delimiter)) = @Delimiter
      ) AS y WHERE rn = @Index
    );
END
GO

Now the update, given the above table above (prior to the update and prior to the drop), is simply:

UPDATE dbo.Addresses
  SET Address1 = dbo.ElementFromOrderedList([Address], CHAR(13) + CHAR(10), 1),
      Address2 = dbo.ElementFromOrderedList([Address], CHAR(13) + CHAR(10), 2),
      Address3 = dbo.ElementFromOrderedList([Address], CHAR(13) + CHAR(10), 3),
      Address4 = dbo.ElementFromOrderedList([Address], CHAR(13) + CHAR(10), 4),
      Address5 = dbo.ElementFromOrderedList([Address], CHAR(13) + CHAR(10), 5);

Upvotes: 3

Related Questions