Reputation: 63
I have over 7 million rows, otherwise I would use Excel.
My address column has a varying number of words. Some are as short as '123 bay street', while others can be as long as '1234 west spring hill drive apt 123'.
My goal is to put each word into its own column. I was able to get the first word, using the query below. But I can't create a query efficient enough to do the rest.
update X
set X.Address_number = Y.[address]
from
(SELECT
unique_id,
CASE
WHEN SUBSTRING(phy_addr1, 1, CHARINDEX(' ', phy_addr1)) = ''
THEN phy_addr1 + ' '
ELSE SUBSTRING(phy_addr1, 1, CHARINDEX(' ', phy_addr1))
END 'address'
FROM
[RD_GeoCode].[dbo].[PA_Stg_excel]) as Y
inner join
[RD_GeoCode].[dbo].[rg_ApplicationData_AllForms_20160401_address] as X on X.unique_id = Y.unique_id
where
X.Address_number is null
Upvotes: 0
Views: 155
Reputation: 28930
you need to have a Numbers table and one of the split strings mentioned here.once you have that ,then its simple..
-----String splitter function
CREATE FUNCTION dbo.SplitStrings_Numbers
(
@List NVARCHAR(MAX),
@Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = SUBSTRING(@List, Number,
CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)
FROM dbo.Numbers
WHERE Number <= CONVERT(INT, LEN(@List))
AND SUBSTRING(@Delimiter + @List, Number, LEN(@Delimiter)) = @Delimiter
);
GO
you can use the above function like below..
select
*
from yourtable t
cross apply
dbo.SplitStrings_Numbers(t.address,' ') b
instead of updating values into same table,i suggest create some other table which has links to above table.This requires some schema modification to your existing table
create table addressreferences
(
addresss varchar(300),
delimitedvalue varchar(100)
)
insert into addressreferences
select
address,b.*
from yourtable t
cross apply
dbo.SplitStrings_Numbers(t.address,' ') b
This is just a pseudo code to give an idea,you will have to take care of references...Updating same table will not work ,because you are not aware how many rows an address column can span
Update:
I think a trigger will be better suit for your scenario instead of references ..But you have to do an insert first on references table for existing values .here is some pseudo code..
create trigger trg_test
after insert,update,delete
on dbo.yourtable
as
begin
---check for inserts
if exists(Select * from inserted)
begin
insert into addressreferences
select address,b.* from inserted i
cross apply
dbo.splitstrings(address,' ') b
--check for deletes
if exists(select 1 from deleted)
begin
delete * from
deleted d
join
adressreferences a
on a.address=d.address
end
if update(address)
begin
---here i recommend doing delete first since your old address and new one may not have equal rows
delete * from
deleted d
join
addressreferences a
on a.address=d.address
--then do a insert
insert into addressreferences
select address,a.* from
inserted i
join
addressreferences a
on a.address=i.address
end
end
end
Upvotes: 1
Reputation: 133
A sequence table is a good thing. As in Louis Davidson's 'Pro Relational database design and implementation', you can create it
CREATE SCHEMA tools
go
CREATE TABLE tools.sequence
(
i int CONSTRAINT PKtools_sequence PRIMARY KEY
)
-- Then I will load it, up to 99999:
;WITH DIGITS (i) as(--set up a set of numbers from 0-9
SELECT i
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as digits (i))
--builds a table from 0 to 99999
,sequence (i) as (
SELECT D1.i + (10*D2.i) + (100*D3.i) + (1000*D4.i) + (10000*D5.i)
--+ (100000*D6.i)
FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3
CROSS JOIN digits AS D4 CROSS JOIN digits AS D5
/* CROSS JOIN digits AS D6 */)
INSERT INTO tools.sequence(i)
SELECT i
FROM sequence
Then split your input, again code from L. Davidson's book
DECLARE @delimitedList VARCHAR(100) = '1,2,3,4,5'
SELECT word = SUBSTRING(',' + @delimitedList + ',',i + 1,
CHARINDEX(',',',' + @delimitedList + ',',i + 1) - i - 1)
FROM tools.sequence
WHERE i >= 1
AND i < LEN(',' + @delimitedList + ',') - 1
AND SUBSTRING(',' + @delimitedList + ',', i, 1) = ','
ORDER BY i
using a space rather than a comma.
Finally, I would think of using the PIVOT operator to turn the rows into columns, but for it to work, you need to specify the maximum number of words.
Upvotes: 1