Reputation: 3874
I would like to know the most efficient
way of removing
any occurrence
of characters like , ; / "
from a varchar
column.
I have a function like this but it is incredibly slow. The table has about 20 million records.
CREATE FUNCTION [dbo].[Udf_getcleanedstring] (@s VARCHAR(255))
returns VARCHAR(255)
AS
BEGIN
DECLARE @o VARCHAR(255)
SET @o = Replace(@s, '/', '')
SET @o = Replace(@o, '-', '')
SET @o = Replace(@o, ';', '')
SET @o = Replace(@o, '"', '')
RETURN @o
END
Upvotes: 3
Views: 5075
Reputation: 453628
Whichever method you use it is probably worth adding a
WHERE YourCol LIKE '%[/-;"]%'
Except if you suspect that a very large proportion of rows will in fact contain at least one of the characters that need to be stripped.
As you are using this in an UPDATE
statement then simply adding the WITH SCHEMABINDING
attribute can massively improve things and allow the UPDATE to proceed row by row rather than needing to cache the entire operation in a spool first for Halloween Protection
Nested REPLACE
calls in TSQL are slow anyway though as they involve multiple passes through the strings.
You could knock up a CLR function as below (if you haven't worked with these before then they are very easy to deploy from an SSDT project as long as CLR execution is permitted on the server). The UPDATE plan for this too does not contain a spool.
The Regular Expression uses (?:)
to denote a non capturing group with the various characters of interest separated by the alternation character |
as /|-|;|\"
(the "
needs to be escaped in the string literal so is preceded by a slash).
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
private static readonly Regex regexStrip =
new Regex("(?:/|-|;|\")", RegexOptions.Compiled);
[SqlFunction]
public static SqlString StripChars(SqlString Input)
{
return Input.IsNull ? null : regexStrip.Replace((string)Input, "");
}
}
Upvotes: 4
Reputation: 433
I want to show the huge performance differences between the using with 2 types of USER DIFINED FUNCTIONS:
See the test example :
use AdventureWorks2012
go
-- create table for the test
create table dbo.FindString (ColA int identity(1,1) not null primary key,ColB varchar(max) );
declare @text varchar(max) = 'A web server can handle a Hypertext Transfer Protocol request either by reading
a file from its file ; system based on the URL <> path or by handling the request using logic that is specific
to the type of resource. In the case that special logic is invoked the query string will be available to that logic
for use in its processing, along with the path component of the URL.';
-- init process in loop 1,000,000
insert into dbo.FindString(ColB)
select @text
go 1000000
-- use one of the scalar function from the answers which post in this thread
alter function [dbo].[udf_getCleanedString]
(
@s varchar(max)
)
returns varchar(max)
as
begin
return replace(replace(replace(replace(@s,'/',''),'-',''),';',''),'"','')
end
go
--
-- create from the function above new function an a table function ;
create function [dbo].[utf_getCleanedString]
(
@s varchar(255)
)
returns table
as return
(
select replace(replace(replace(replace(@s,'/',''),'-',''),';',''),'"','') as String
)
go
--
-- clearing the buffer cach
DBCC DROPCLEANBUFFERS ;
go
-- update process using USER TABLE FUNCTIO
update Dest with(rowlock) set
dest.ColB = D.String
from dbo.FindString dest
cross apply utf_getCleanedString(dest.ColB) as D
go
DBCC DROPCLEANBUFFERS ;
go
-- update process using USER SCALAR FUNCTION
update Dest with(rowlock) set
dest.ColB = dbo.udf_getCleanedString(dest.ColB)
from dbo.FindString dest
go
AND these are the execution plan : As you can see the UTF is much better the USF ,they 2 doing the same thing replacing string, but one return scalar and the other return as a table
Another important parameter for you to see (SET STATISTICS IO ON ;)
Upvotes: 2
Reputation: 44
Here is a similar question asked previously, I like this approach mentioned here.
How to Replace Multiple Characters in SQL?
declare @badStrings table (item varchar(50))
INSERT INTO @badStrings(item)
SELECT '>' UNION ALL
SELECT '<' UNION ALL
SELECT '(' UNION ALL
SELECT ')' UNION ALL
SELECT '!' UNION ALL
SELECT '?' UNION ALL
SELECT '@'
declare @testString varchar(100), @newString varchar(100)
set @teststring = 'Juliet ro><0zs my s0x()rz!!?!one!@!@!@!'
set @newString = @testString
SELECT @newString = Replace(@newString, item, '') FROM @badStrings
select @newString -- returns 'Juliet ro0zs my s0xrzone'
Upvotes: 0
Reputation: 93181
How about nesting them together in a single call:
create function [dbo].[udf_getCleanedString]
(
@s varchar(255)
)
returns varchar(255)
as
begin
return replace(replace(replace(replace(@s,'/',''),'-',''),';',''),'"','')
end
Or you may want to do an UPDATE
on the table itself for the first time. Scalar functions are pretty slow.
Upvotes: 0