James
James

Reputation: 572

SQL Performance (Replace)

I have a table that has about 400,000+ rows. I am writing some pattern matching code but need to clean up a column before I do so. This boils down to doing a replace-like operation.

I tried listing them all out one at a time...

 Update T_ADDRESS set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southeast ',' se ')
 Update T_ADDRESS set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southwest ',' sw ')

Since I have over 500 of these...it took too long.

Now I am trying to nest them...

 Update T_ADDRESS set ADDR_LINEONE = REPLACE(REPLACE(ADDR_LINEONE,' southwest ',' sw '),' southeast ',' se ')

But this is still painfully slow. I need to make this code work on tables of all sizes (1 record to 5 million records).

Anyone have any advice? I am using SQL Server by the way.

Upvotes: 2

Views: 8370

Answers (5)

jcmeyrignac
jcmeyrignac

Reputation: 129

Old question, but still useful. Your query updates all the columns of the table, with a large penalty if this column is indexed. You can improve the speed by filtering the fields that do not need to be updated as follows:

Update T set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southeast ',' se ') FROM T_ADDRESS T WHERE ADDR_LINEONE LIKE '% southeast %'
Update T set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southwest ',' sw ') FROM T_ADDRESS T WHERE ADDR_LINEONE LIKE '% southwest %'

Unless all your addresses contain southeast or southwest, this will be an order of magnitude faster!

Upvotes: 0

Chris W
Chris W

Reputation: 3324

You could write the update statement to use a CASE statement in place of the multiple REPLACEs although I've no idea whether this will execute any faster for you.

Upvotes: 0

Remus Rusanu
Remus Rusanu

Reputation: 294387

You have to always scan the table end-to-end no matter how fancy you do the REPLACE. This is what is killing performance, and it cannot be changed since you have to way of indexing the ADDR_LINEONE field in any sensible manner.

Since this should be a one-time only operation, the long time should not matter.

If this is a repeated operation, then your problem is not here, is in how you load the data into the table: do the transformation before you save the data, otherwise you stand no chance.

Upvotes: 10

adopilot
adopilot

Reputation: 4520

Create #TEMP table with new values then inner join Something like this

create table #TempValues
(oldAres varchar(12),newadres varchar(2))

insert into #TempValues
select 'southeast','se'
union all 
select 'southwest','sw'

update T_ADDRESS
set addr_lineone=t.newadres
from T_ADDRESS inner join #TempValues t on T_ADDRESS.addr_lineone=t.oldAdres

Upvotes: 0

Mitch Wheat
Mitch Wheat

Reputation: 300719

Write a CLR procedure. TSQL is not great at (or designed for) handling large numbers of string manipulations.

Regular Expressions Make Pattern Matching And Data Extraction Easier

Upvotes: 2

Related Questions