ObiWanCannoli
ObiWanCannoli

Reputation: 63

SQL separating address into mutiple columns using spaces

I have over 7 million rows, otherwise I would use Excel.

My address column has a varying number of words. Some are as short as '123 bay street', while others can be as long as '1234 west spring hill drive apt 123'.

My goal is to put each word into its own column. I was able to get the first word, using the query below. But I can't create a query efficient enough to do the rest.

update X
set X.Address_number = Y.[address]
from 
     (SELECT 
          unique_id,
          CASE 
             WHEN SUBSTRING(phy_addr1, 1, CHARINDEX(' ', phy_addr1)) = ''
                THEN phy_addr1 + ' '
             ELSE SUBSTRING(phy_addr1, 1, CHARINDEX(' ', phy_addr1))
          END 'address'
      FROM 
         [RD_GeoCode].[dbo].[PA_Stg_excel]) as Y
  inner join 
      [RD_GeoCode].[dbo].[rg_ApplicationData_AllForms_20160401_address] as X on X.unique_id = Y.unique_id
  where 
      X.Address_number is null

Upvotes: 0

Views: 155

Answers (2)

TheGameiswar
TheGameiswar

Reputation: 28930

you need to have a Numbers table and one of the split strings mentioned here.once you have that ,then its simple..

-----String splitter function

CREATE FUNCTION dbo.SplitStrings_Numbers
(
   @List       NVARCHAR(MAX),
   @Delimiter  NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
   RETURN
   (
       SELECT Item = SUBSTRING(@List, Number, 
         CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)
       FROM dbo.Numbers
       WHERE Number <= CONVERT(INT, LEN(@List))
         AND SUBSTRING(@Delimiter + @List, Number, LEN(@Delimiter)) = @Delimiter
   );
GO

you can use the above function like below..

select 
*
 from yourtable t
cross apply
dbo.SplitStrings_Numbers(t.address,' ') b

instead of updating values into same table,i suggest create some other table which has links to above table.This requires some schema modification to your existing table

create table addressreferences
(
addresss varchar(300),
delimitedvalue varchar(100)
)

insert into addressreferences
 select 
    address,b.*
     from yourtable t
    cross apply
    dbo.SplitStrings_Numbers(t.address,' ') b

This is just a pseudo code to give an idea,you will have to take care of references...Updating same table will not work ,because you are not aware how many rows an address column can span

Update:
I think a trigger will be better suit for your scenario instead of references ..But you have to do an insert first on references table for existing values .here is some pseudo code..

create trigger trg_test
after insert,update,delete 
on dbo.yourtable
as
begin
---check for inserts
if exists(Select * from inserted)
begin
insert into addressreferences
select address,b.* from inserted i
cross apply
dbo.splitstrings(address,' ') b

--check for deletes
if exists(select 1 from deleted)
begin

delete * from 
deleted d
join
adressreferences a
on a.address=d.address

end

if update(address)
begin
---here i recommend doing delete first since your old address and new one may not have equal rows

delete * from
deleted d
join
addressreferences a
on a.address=d.address

--then do a insert
insert into addressreferences
select address,a.* from
inserted i
join
addressreferences a
on a.address=i.address

end

end

end

Upvotes: 1

Eugene Bykov
Eugene Bykov

Reputation: 133

A sequence table is a good thing. As in Louis Davidson's 'Pro Relational database design and implementation', you can create it

CREATE SCHEMA tools
go
CREATE TABLE tools.sequence
(
i int CONSTRAINT PKtools_sequence PRIMARY KEY
)

-- Then I will load it, up to 99999:
;WITH DIGITS (i) as(--set up a set of numbers from 0-9
SELECT i
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as digits (i))
--builds a table from 0 to 99999
,sequence (i) as (
SELECT D1.i + (10*D2.i) + (100*D3.i) + (1000*D4.i) + (10000*D5.i)
--+ (100000*D6.i)
FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3
CROSS JOIN digits AS D4 CROSS JOIN digits AS D5
/* CROSS JOIN digits AS D6 */)
INSERT INTO tools.sequence(i)
SELECT i
FROM sequence

Then split your input, again code from L. Davidson's book

DECLARE @delimitedList VARCHAR(100) = '1,2,3,4,5'
SELECT word = SUBSTRING(',' + @delimitedList + ',',i + 1,
CHARINDEX(',',',' + @delimitedList + ',',i + 1) - i - 1)
FROM tools.sequence
WHERE i >= 1
AND i < LEN(',' + @delimitedList + ',') - 1
AND SUBSTRING(',' + @delimitedList + ',', i, 1) = ','
ORDER BY i

using a space rather than a comma.

Finally, I would think of using the PIVOT operator to turn the rows into columns, but for it to work, you need to specify the maximum number of words.

Upvotes: 1

Related Questions