Reputation: 81493
I am trying to write an efficient function to use in a calculated field which has the following characteristics
Example input
A B@#%$$C &^%D
Example output
a b c d
A normal regex pattern would match like so
[\W_]+
The following works, however I am not sure if there is a more efficient approach than using 2 loops ( O(n2) complexity at least) with PatIndex
and Stuff
, charindex
and replace
Create Function [dbo].[Clean](@Temp nvarchar(1000))
Returns nvarchar(1000)
AS
Begin
Declare @Pattern as varchar(50) = '%[^a-z0-9 ]%'
While PatIndex(@Pattern, @Temp) > 0
Set @Temp = Stuff(@Temp, PatIndex(@Pattern, @Temp), 1, ' ')
while charindex(' ',@Temp ) > 0
set @Temp = replace(@Temp, ' ', ' ')
Return LOWER(TRIM(@Temp))
End
Usage
Select dbo.Clean(' A B@#%$$C &^%D ')
Result
a b c d
Is there potentially a single pass approach I can use, or a sneaky method I am not aware of?
Upvotes: 0
Views: 883
Reputation: 29943
I'm not able to test the performance, but the following approach (without loops and based on some string manipulations) is an additional option.
Note, that you'll need at least SQL Server 2017 (for the TRANSLATE()
call).
-- Input text and patterns
DECLARE @text varchar(1000) = ' A B@#%$$C &^%D'
DECLARE @alphanumericpattern varchar(36) = 'abcdefghijklmnopqrstuvwxyz0123456789'
DECLARE @notalphanumericpattern varchar(1000)
-- Trim and lower the input text
SELECT @text = RTRIM(LTRIM(LOWER(@text)))
-- Get not alpha-numeric characters
SELECT @notalphanumericpattern =
REPLACE(
TRANSLATE(@text, @alphanumericpattern, REPLICATE('a', LEN(@alphanumericpattern))),
'a',
''
)
-- Replace all not alpha-numeric characters with a space
SELECT @text =
REPLACE(
TRANSLATE(@text, @notalphanumericpattern, REPLICATE('$', LEN(@notalphanumericpattern))),
'$',
' '
)
-- Replace multiple spaces with a single space
SELECT @text =
REPLACE(
REPLACE(
REPLACE(
@text,
' ',
'<>'
),
'><',
''
),
'<>',
' '
)
Result:
a b c d
Upvotes: 1