Reputation: 9
I have a large table that i am working on and for nearly all of the columns i need to use the replace statement to remove single and double quotes. The code looks like this:
SET QUOTED_IDENTIFIER ON
Update dbo.transactions set transaction_name1 = Replace(transaction_name1,'''','')
Update dbo.transactions set transaction_name2 = Replace(transaction_name2,'''','')
Update dbo.transactions set transaction_name3 = Replace(transaction_name3,'''','')
Update dbo.transactions set transaction_name4 = Replace(transaction_name4,'''','')
Update dbo.transactions set transaction_name5 = Replace(transaction_name5,'''','')
I have not put an index on the table as was not sure exactly what column would be any good being that i'm updating nearly all the columns. If i sorted the table asc by the primary key would that help increase performance?
Over then that the statements have been running for over 2 hours with no error messages and wondered if there is a solution to this performance issue, besides the usual more hardware changes? If someone could advise on ways of increasing performance of the script.
Cheers, Peter
Upvotes: 0
Views: 4236
Reputation: 108336
You can make this a single UPDATE statement:
UPDATE transactions SET
transaction_name1 = Replace(transaction_name1,'''',''),
transaction_name2 = Replace(transaction_name2,'''','')
... (and so on)
That would likely improve the performance by something approaching a factor of 5.
Edit:
Since this is a one shot thing on a huge dataset (90MM rows), I suggest adding in a where clause and running it in batches.
If your transactions have a primary key, partition the updates on that, doing maybe 500k at once.
Do this in a loop with explicit transactions to keep your log use to a minimum:
DECLARE @BaseID INT, @BatchSize INT
SELECT @BaseID = MAX(YourKey), @BatchSize = 500000 FROM transactions
WHILE @BaseID > 0 BEGIN
PRINT 'Updating from ' + CAST(@BaseID AS VARCHAR(20))
-- perform update
UPDATE transactions SET
transaction_name1 = Replace(transaction_name1,'''',''),
transaction_name2 = Replace(transaction_name2,'''','')
-- ... (and so on)
WHERE YourKey BETWEEN @BaseID - @BatchSize AND @BaseID
SET @BaseID = @BaseID - @BatchSize - 1
END
Another note:
If the quotes must not appear in your data, you can create a check constraint to keep them out. It's a last ditch effort as any app attempting to put them in would need to handle a database exception, but it will keep your data clean. Something like this might do it:
ALTER TABLE transactions
ADD CONSTRAINT CK_NoQuotes CHECK(
CHARINDEX('''',transaction_name1)=0 AND
CHARINDEX('''',transaction_name2)=0 AND
-- and so on...
)
Upvotes: 4
Reputation: 103637
You might try making this only a single UPDATE and only updating the rows that need it:
UPDATE dbo.transactions
SET transaction_name1 = REPLACE(transaction_name1,'''',''),
transaction_name2 = REPLACE(transaction_name2,'''',''),
transaction_name3 = REPLACE(transaction_name3,'''',''),
transaction_name4 = REPLACE(transaction_name4,'''',''),
transaction_name5 = REPLACE(transaction_name5,'''','')
WHERE
CHARINDEX('''',transaction_name1)>0
OR CHARINDEX('''',transaction_name2)>0
OR CHARINDEX('''',transaction_name3)>0
OR CHARINDEX('''',transaction_name4)>0
OR CHARINDEX('''',transaction_name5)>0
Upvotes: 1
Reputation: 19263
You can combine the statements, that might be a bit faster:
SET QUOTED_IDENTIFIER ON
UPDATE dbo.transactions
SET transaction_name1 = REPLACE(transaction_name1,'''',''),
transaction_name2 = REPLACE(transaction_name2,'''',''),
transaction_name3 = REPLACE(transaction_name3,'''',''),
transaction_name4 = REPLACE(transaction_name4,'''',''),
transaction_name5 = REPLACE(transaction_name5,'''','')
Also, check out the estimated execution plan. It might give you a useful advice in how to optimize your database / query (it's a little square button in the button bar of SQL Management Studio).
Upvotes: 2