Reputation: 572
I have a table that has about 400,000+ rows. I am writing some pattern matching code but need to clean up a column before I do so. This boils down to doing a replace-like operation.
I tried listing them all out one at a time...
Update T_ADDRESS set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southeast ',' se ')
Update T_ADDRESS set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southwest ',' sw ')
Since I have over 500 of these...it took too long.
Now I am trying to nest them...
Update T_ADDRESS set ADDR_LINEONE = REPLACE(REPLACE(ADDR_LINEONE,' southwest ',' sw '),' southeast ',' se ')
But this is still painfully slow. I need to make this code work on tables of all sizes (1 record to 5 million records).
Anyone have any advice? I am using SQL Server by the way.
Upvotes: 2
Views: 8370
Reputation: 129
Old question, but still useful. Your query updates all the columns of the table, with a large penalty if this column is indexed. You can improve the speed by filtering the fields that do not need to be updated as follows:
Update T set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southeast ',' se ') FROM T_ADDRESS T WHERE ADDR_LINEONE LIKE '% southeast %'
Update T set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southwest ',' sw ') FROM T_ADDRESS T WHERE ADDR_LINEONE LIKE '% southwest %'
Unless all your addresses contain southeast or southwest, this will be an order of magnitude faster!
Upvotes: 0
Reputation: 3324
You could write the update statement to use a CASE statement in place of the multiple REPLACEs although I've no idea whether this will execute any faster for you.
Upvotes: 0
Reputation: 294387
You have to always scan the table end-to-end no matter how fancy you do the REPLACE. This is what is killing performance, and it cannot be changed since you have to way of indexing the ADDR_LINEONE field in any sensible manner.
Since this should be a one-time only operation, the long time should not matter.
If this is a repeated operation, then your problem is not here, is in how you load the data into the table: do the transformation before you save the data, otherwise you stand no chance.
Upvotes: 10
Reputation: 4520
Create #TEMP table with new values then inner join Something like this
create table #TempValues
(oldAres varchar(12),newadres varchar(2))
insert into #TempValues
select 'southeast','se'
union all
select 'southwest','sw'
update T_ADDRESS
set addr_lineone=t.newadres
from T_ADDRESS inner join #TempValues t on T_ADDRESS.addr_lineone=t.oldAdres
Upvotes: 0
Reputation: 300719
Write a CLR procedure. TSQL is not great at (or designed for) handling large numbers of string manipulations.
Regular Expressions Make Pattern Matching And Data Extraction Easier
Upvotes: 2