Brent Butler
Brent Butler

Reputation: 13

T-SQL Syntax - word parser

I have a word list in a variable. The words are comma delimited. I am trying to save them to individual record in a database. I found another question which does this and works, but it saves every word. I tried to modify it so that I only save unique words, and count duplicate as I go. I think the logic below is correct, but my syntax is not working.

  1. if @FoundWord = 0 SET @WordsUsed = @WordsUsed + '' + @Word + ''

** is not concatenating the next word onto the end of the @WordsUsed variable

  1. if @FoundWord = 0 INSERT INTO mydata.dbo.words (ProjectNumber, WordCount, Word) VALUES( '5', '1', @Word )

** doesn't seem to be doing anything at all ... I'm getting no records written to the words table

The entire code follows:

declare @SplitOn nvarchar(5) = ','
BEGIN
DECLARE @split_on_len INT = LEN(@SplitOn)
DECLARE @start_at INT = 1
DECLARE @end_at INT
DECLARE @data_len INT
DECLARE @WordsUsed varchar(max)
DECLARE @FoundWord int
DECLARE @Word varchar(100)

Set @WordsUsed = '**'
WHILE 1=1
BEGIN
    SET @end_at = CHARINDEX(@SplitOn,@txt1,@start_at)
    SET @data_len = CASE @end_at WHEN 0 THEN LEN(@txt1) ELSE @end_at-@start_at END
    set @Word = SUBSTRING(@txt1,@start_at,@data_len)
    SET @FoundWord = CHARINDEX('*' & @Word & '*', @WordsUsed)
    if @FoundWord = 0 SET @WordsUsed = @WordsUsed & '*' & @Word & '*'
    if @FoundWord = 0 INSERT INTO mydata.dbo.words (ProjectNumber, WordCount, Word) VALUES( '5', '1', @Word )
    if @FoundWord > 0 Update mydata.dbo.words set WordCount = WordCount + 1 where projectnumber = 5 and word = @word
    IF @end_at = 0 BREAK
    SET @start_at = @end_at + @split_on_len
END

RETURN
END;

Upvotes: 0

Views: 565

Answers (2)

Clay
Clay

Reputation: 5084

Here's a function you can try. It splits an unlimited-sized string based on the delimiter of your choice. The output of the function is a table - so you can then select distinct from that to store your word list. You'd call it something like:

insert YourWordlistTable ( Word )  --> just making up table/column names here
select distinct Data 
from dbo.StringSplit( @yourVar, ',' )

Here's the definition of the function:

create function dbo.StringSplit
(
   @string nvarchar( max ),
   @delimiter nvarchar( 255 )
)
returns @t table ( Id int, Data nvarchar( 4000 ) )
as begin

    with Split( startPosition, endPosition )
    as
    (
      select 
         cast( 0 as bigint ) as startPosition, 
         charindex( @delimiter, @string ) as endPosition
      union all
      select 
         endPosition + 1, charindex( @delimiter, @string, endPosition + 1 )
      from 
         Split
      where 
         endPosition > 0
    )
    insert @t
    select 
      row_number() over ( order by ( select 1 ) ) as Id,
      substring( @string, startPosition, coalesce( nullif( endPosition, 0 ), len( @string ) + 1 ) - startPosition ) collate Latin1_General_CS_AS as Data
    from 
      Split 
    option( maxrecursion 0  );

    return;
end

I originally posted, then deleted, then reposted this when I realized I'd given an inline function I use that never got called on strings longer than 100 words. I've since modified it to support indefinite recursion - although it can't be an inline function this way.

Inline functions are generally faster because SQL can incorporate the inline function's statement into the query plan of the statements that call the function. However, that does not appear to be an option available in an inline function.

Upvotes: 1

Binesh Nambiar C
Binesh Nambiar C

Reputation: 162

why can't you use select count(distinct value) from dbo.split(',',@txt1) where @txt1 is he string will give you distinct count of words

If you are looking for word wise count

select value,count(1) from dbo.split(',',@txt1) group by value 

shoud do this

Upvotes: 0

Related Questions