jkl
jkl

Reputation: 675

Optimize finding the Nth occurrence of character in string

I wrote a sql server function which returns substring before the Nth occurence of character.

For example, SELECT dbo.fn_getFirstNthSentence('.', 'hello world.It.is.raining.today', 3) returns 'hello world.It.Is.' as a result.

The function I wrote looks dirty and slow so I want to optimize it. Any advice to make it clean is appreciated.

Thank you.

CREATE FUNCTION fn_getFirstNthSentence
(
  @TargetStr VARCHAR(MAX) ,
  @SearchedStr VARCHAR(8000) ,
  @Occurrence INT
)
RETURNS varchar(MAX)
AS
BEGIN

    DECLARE @pos INT ,
        @counter INT ,
        @ret INT;

    SET @pos = CHARINDEX(@TargetStr, @SearchedStr);

    IF ( @pos = 0 )
        RETURN @SearchedStr

    SET @counter = 1;

    IF @Occurrence = 1
        SET @ret = @pos;

    ELSE
        BEGIN

            WHILE ( @counter < @Occurrence )
                BEGIN

                    IF(LEN(@SearchedStr) < @pos + 1)
                        RETURN @SearchedStr

                    SELECT  @ret = CHARINDEX(@TargetStr, @SearchedStr,
                                             @pos + 1);
                    IF(@ret = 0)
                        RETURN @SearchedStr
                    SET @counter = @counter + 1;
                    SET @pos = @ret;
                END;
        END;
    RETURN LEFT(@SearchedStr, @ret)
END;

Upvotes: 2

Views: 5187

Answers (3)

John Cappelletti
John Cappelletti

Reputation: 81930

Another option is via XML

I can't see your benchmarks, but it is certainly far less code. An added option could be Find the 3rd through 5th occurrence by adding a parameter and changing the Where Seq<=@FindPos to Where Seq Between range1 and range2.

Declare @FindPos int = 3
Declare @String  varchar(max) = 'hello world.It.is.raining.today'
Declare @Delim   varchar(10)  = '.'


Declare @XML xml,@RetVal varchar(max) = ''
Set @XML = Cast('<x>' + Replace(@String,@Delim,'</x><x>')+'</x>' as XML)

Declare @Table table (Seq int identity(1,1),String varchar(max))
Insert Into @Table Select ltrim(rtrim(String.value('.', 'varchar(max)')))+@Delim as value FROM @XML.nodes('x') as T(String)

Select @RetVal=@RetVal + String from @Table Where Seq<=@FindPos Order By Seq

Select @RetVal

Returns

hello world.It.is.

EDIT: If it helps, below is my generic parsing function which returns a normalized table...

CREATE FUNCTION [dbo].[udf-Str-Parse] (@String varchar(max),@Delimeter varchar(10))
--Usage: Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
--       Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
--       Select * from [dbo].[udf-Str-Parse]('id26,id46|id658,id967','|')
--       Select * from [dbo].[udf-Str-Parse]('hello world. It. is. . raining.today','.')

Returns @ReturnTable Table (Key_PS int IDENTITY(1,1), Key_Value varchar(max))
As
Begin
   Declare @XML xml;Set @XML = Cast('<x>' + Replace(@String,@Delimeter,'</x><x>')+'</x>' as XML)
   Insert Into @ReturnTable Select Key_Value = ltrim(rtrim(String.value('.', 'varchar(max)'))) FROM @XML.nodes('x') as T(String)
   Return 
End

So for example:

Select * from [dbo].[udf-Str-Parse]('hello world.It.is.raining.today','.')

Returns

Key_PS  Key_Value
1       hello world
2       It
3       is
4       raining
5       today

Upvotes: 1

Sean Lange
Sean Lange

Reputation: 33571

Here is yet another option using a delimited string splitter. The XML method already posted is a good one but this approach does not require a table variable.

This is created as an inline table valued function which should keep the performance really fast.

create function fn_getFirstNthSentence
(
    @SearchedStr varchar(100)
    , @Occurrence int
    , @Delimiter char(1)
) returns table as return

    with ParsedValues as
    (
        select Item
            , ItemNumber
        from dbo.DelimitedSplit8K(@SearchedStr, @Delimiter)
        where ItemNumber <= @Occurrence
    )

    select top 1 ResultString = STUFF(
    (
        select @Delimiter + Item
        from ParsedValues
        order by ItemNumber
        for xml path('')), 1,1, '') + @Delimiter
    from ParsedValues

This is also using a splitter created by Jeff Moden. It has one feature that none of the other splitter have...a column to indicate which position the value came from. You can find his article an ensuing discussion here. http://www.sqlservercentral.com/articles/Tally+Table/72993/

Then if you want to execute it you can do this quite simply.

declare @String varchar(100) = 'hello world.It.is.raining.today.'
    , @Num int = 3
    , @Delimiter char(1) = '.'
;

select *
from fn_getFirstNthSentence(@String, @Num, @Delimiter)

If you don't like Jeff Moden's splitter you can find several other options here. http://sqlperformance.com/2012/07/t-sql-queries/split-strings I don't use Moden's for everything but when you need to keep the parsed values in order it is awesome.

--EDIT--

Here is how you could modify this to become a scalar function instead of an inline table valued function. My preference would be to keep the itvf as they are faster and more flexible.

create function fn_getFirstNthSentenceScalar
(
    @SearchedStr varchar(100) = 'hello world.It.is.raining.today.this is after 5'
    , @Occurrence int = 5
    , @Delimiter char(1) = '.'
) returns varchar(max) as begin

    declare @RetVal varchar(max);

    with ParsedValues as
    (
        select Item
            , ItemNumber
        from dbo.DelimitedSplit8K(@SearchedStr, @Delimiter)
        where ItemNumber <= @Occurrence
    )

    select top 1 @RetVal = STUFF(
    (
        select @Delimiter + Item
        from ParsedValues
        order by ItemNumber
        for xml path('')), 1,1, '') + @Delimiter
    from ParsedValues;

    return @RetVal
end

Upvotes: 2

Cato
Cato

Reputation: 3701

--I find these functions to be a mine-field, and at the risk of stepping on a mine I've tried some simplifications - maybe a microscopic improvement in performance

alter FUNCTION fn_getFirstNthSentence
(
@TargetStr VARCHAR(MAX) ,
@SearchedStr VARCHAR(8000) ,
@Occurrence INT
)
RETURNS varchar(MAX)
AS
BEGIN

DECLARE @pos INT ,
    @counter INT ;

IF @Occurrence < 1  
    RETURN NULL;

SELECT @counter = 0, @POS = 1;

WHILE (@counter < @Occurrence AND @POS > 0)
BEGIN

    SELECT  @POS = CHARINDEX(@TargetStr, @SearchedStr,
                                @pos + 1);
    IF @POS > 0 
        SET  @counter = @counter + 1;

END;
RETURN CASE WHEN @POS > 0 THEN
            LEFT(@SearchedStr, @POS)
        ELSE
            @SearchedStr
        END;

END;

Upvotes: 1

Related Questions