MGM
MGM

Reputation: 65

Replace and like in Joins

I have two tables tbl_words with column words and tbl_keywords with column keywords and the sample data in the table is as follows

Input:

tbl_words

Word
house for rent in ohio

tbl_keywords

keywords
house
rent

Now my objective is to join these two tables and replace the matching keywords from tbl_keywords with an empty string('') in table tbl_words.

Desired Output:

tbl_words
Word
for in ohio

Schema creation for the above tables

Upvotes: 2

Views: 118

Answers (2)

Seph
Seph

Reputation: 8703

You can do this with a CTE to loop over all your words and replace all the words with blank spaces.

Demo here http://sqlfiddle.com/#!3/7563a/21

WITH CTE (org, calc, DATA, level) AS
(
    SELECT word, word, CONVERT(VARCHAR(500), ' ' + word + ' '), 0
    FROM    tbl_words

    UNION ALL

    SELECT CTE.org, CTE.data,
        CONVERT(VARCHAR(500), REPLACE(CTE.data, tbl_keywords.Keyword + ' ', '')), CTE.level + 1
    FROM    CTE
    INNER JOIN tbl_keywords ON CTE.data LIKE '% ' + tbl_keywords.Keyword + ' %'
        COLLATE Latin1_General_CS_AS
)
SELECT DISTINCT org, LTRIM(RTRIM(data)) AS DATA, level
FROM CTE
WHERE level =
    (SELECT MAX(level) FROM CTE c WHERE CTE.org = c.org)

The adding of ' ' to the start and end of each word is to allow you to have words like 'acters' not affect 'characters' otherwise it would simply become 'char'. It also prevents having to deal with multiple space characters in the middle of your phrase such as ' for in ohio'.

This will need to be adjusted if you have punctuation in your sentance such as 'houses for sale. in ohio' but that can be left as an exercise for you to finish.

Edit:

depending on your performance this approach might be better (description after)...

create function [dbo].[Split] 
( 
    @string nvarchar(MAX), 
    @delimiter nvarchar(10) 
) 
returns @table table 
( 
    [Value] nvarchar(MAX) 
) 
begin 
    declare @nextString nvarchar(MAX) 
    declare @pos int, @nextPos int 
    declare @commaCheck nvarchar(1) 

    set @nextString = '' 
    set @commaCheck = right(@string, 1) 
    set @string = @string + @delimiter 

    set @pos = charindex(@delimiter, @string) 
    set @nextPos = 1 
    while (@pos <> 0) 
    begin 
        set @nextString = substring(@string, 1, @pos - 1) 

        insert into @table 
        ( 
            [Value] 
        ) 
        values 
        ( 
            @nextString 
        ) 

        set @string = substring(@string, @pos + 1, len(@string)) 
        set @nextPos = @pos 
        set @pos = charindex(@delimiter, @string) 
    end 
    return 
end

with the query now becoming:

SELECT RTRIM(combined) AS [words] FROM
(SELECT word, 
(
select [value] + ' ' as [text()]
from dbo.split(word, ' ')
where [value] not in (SELECT [keyword] from tbl_keywords)
FOR XML PATH('')
)
 as [combined] 
from tbl_words) x;

Basically we split the phrase into individual words, remove the ones which are key-words and then concatenate them back together using FOR XML PATH.

On my system this runs 6 times faster, but it will depend how long your phrases are and how many words you have. To further improve performance put a (ideally unique) index on [tbl_keywords].[keyword].

Again this won't work for punctuation so you will need to adjust everything accordingly for that if you need it.

Upvotes: 4

vhadalgi
vhadalgi

Reputation: 7189

SELECT replace(w.word, k.keyword, '') as REPLACED_KEYWORDS
FROM #tbl_words w
JOIN #tbl_keywords k
ON w.word LIKE '%' + k.keyword + '%'

SEE DEMO

Upvotes: 0

Related Questions