Reputation: 65
I have two tables tbl_words with column words and tbl_keywords with column keywords and the sample data in the table is as follows
Input:
tbl_words
Word house for rent in ohio
tbl_keywords
keywords house rent
Now my objective is to join these two tables and replace the matching keywords from tbl_keywords with an empty string('') in table tbl_words.
Desired Output:
tbl_words Word for in ohio
Schema creation for the above tables
Upvotes: 2
Views: 118
Reputation: 8703
You can do this with a CTE to loop over all your words and replace all the words with blank spaces.
Demo here http://sqlfiddle.com/#!3/7563a/21
WITH CTE (org, calc, DATA, level) AS
(
SELECT word, word, CONVERT(VARCHAR(500), ' ' + word + ' '), 0
FROM tbl_words
UNION ALL
SELECT CTE.org, CTE.data,
CONVERT(VARCHAR(500), REPLACE(CTE.data, tbl_keywords.Keyword + ' ', '')), CTE.level + 1
FROM CTE
INNER JOIN tbl_keywords ON CTE.data LIKE '% ' + tbl_keywords.Keyword + ' %'
COLLATE Latin1_General_CS_AS
)
SELECT DISTINCT org, LTRIM(RTRIM(data)) AS DATA, level
FROM CTE
WHERE level =
(SELECT MAX(level) FROM CTE c WHERE CTE.org = c.org)
The adding of ' '
to the start and end of each word is to allow you to have words like 'acters'
not affect 'characters'
otherwise it would simply become 'char'
. It also prevents having to deal with multiple space characters in the middle of your phrase such as ' for in ohio'
.
This will need to be adjusted if you have punctuation in your sentance such as 'houses for sale. in ohio'
but that can be left as an exercise for you to finish.
Edit:
depending on your performance this approach might be better (description after)...
create function [dbo].[Split]
(
@string nvarchar(MAX),
@delimiter nvarchar(10)
)
returns @table table
(
[Value] nvarchar(MAX)
)
begin
declare @nextString nvarchar(MAX)
declare @pos int, @nextPos int
declare @commaCheck nvarchar(1)
set @nextString = ''
set @commaCheck = right(@string, 1)
set @string = @string + @delimiter
set @pos = charindex(@delimiter, @string)
set @nextPos = 1
while (@pos <> 0)
begin
set @nextString = substring(@string, 1, @pos - 1)
insert into @table
(
[Value]
)
values
(
@nextString
)
set @string = substring(@string, @pos + 1, len(@string))
set @nextPos = @pos
set @pos = charindex(@delimiter, @string)
end
return
end
with the query now becoming:
SELECT RTRIM(combined) AS [words] FROM
(SELECT word,
(
select [value] + ' ' as [text()]
from dbo.split(word, ' ')
where [value] not in (SELECT [keyword] from tbl_keywords)
FOR XML PATH('')
)
as [combined]
from tbl_words) x;
Basically we split the phrase into individual words, remove the ones which are key-words and then concatenate them back together using FOR XML PATH.
On my system this runs 6 times faster, but it will depend how long your phrases are and how many words you have. To further improve performance put a (ideally unique) index on [tbl_keywords].[keyword].
Again this won't work for punctuation so you will need to adjust everything accordingly for that if you need it.
Upvotes: 4