Peter Penzov
Peter Penzov

Reputation: 1658

How to compare a word is it found in table rows?

I have this PostgreSQL table for storing words:

CREATE TABLE IF NOT EXISTS words
(
    id bigint NOT NULL DEFAULT nextval('processed_words_id_seq'::regclass),
    keyword character varying(300) COLLATE pg_catalog."default",
);

insert into words (keyword)
VALUES ('while swam is interesting');

CREATE TABLE IF NOT EXISTS trademarks
(
   id bigint NOT NULL DEFAULT nextval('trademarks_id_seq'::regclass),
   trademark character varying(300) COLLATE pg_catalog."default",
);

Into table trademarks I will have thousands of registered trademarks names. I want to compare words stored into words table keyword, do they match not only for a words but also for word which is in a group of words. For example:

I have a keyword while swam is interesting stored into words.keyword. I also have a trademark swam located in trademarks.trademark I have a word match, so I want to detect this using SQL.

I tried this:

select 
    w.id, w.keyword, t.trademark 
from
    words w
inner join 
    trademarks t 
on
  t.trademark ilike  '%'||w.keyword||'%' 
where 
   w.keyword = 'all';

But I get this result:

enter image description here

As you can see, the result is wrong. I need to match complete words, not parts of the words. How I can fix this?

Upvotes: 0

Views: 111

Answers (3)

Bohemian
Bohemian

Reputation: 424983

Split the words field on space to create an array, then use any twice: once for the where (ie "any words list containing 'all') and once for the match (ie any trademark that appears in a words list that also contains 'all'):

select 
    w.id,
    w.keyword,
    t.trademark 
from words w
join trademarks t on t.trademark =
  any(string_to_array(w.keyword, ' '))
where 'all' = any(string_to_array(w.keyword, ' '))

This matches the whole word, so no need to worry about word boundaries etc.


You can tidy this up and improve performance with a CTE that unnests the words:

with w as (
    select
        id,
        keyword,
        unnest(string_to_array(keyword, ' ')) as term
    from words
    where 'all' = any(string_to_array(keyword, ' '))
)
select 
    w.id,
    w.keyword,
    t.trademark 
from w
join trademarks t on t.trademark = w.term

Add indexes to make this fly:

create index words_keywords_index on words using gin (keyword);
create index trademarks_trademark_index on trademarks(trademarks);

gin indexes index nested values.

Upvotes: 2

George Joseph
George Joseph

Reputation: 5922

One option is to use regular expression as richyen mentions. If you need to use sql what you have got is correct, except you need to look for a pattern that uses the conditions '%SPACE'||w.keyword||'SPACE%'

select w.id, w.keyword, t.trademark 
  from words w
  join trademarks t 
    on w.keyword like '% '||t.trademark||' %' 
 where t.trademark = 'all'
union all /*condition when there are only single words without a space*/
select w.id, w.keyword, t.trademark 
  from words w
  join trademarks t 
    on w.keyword =t.trademark
 where t.trademark = 'all';

https://dbfiddle.uk/k5bYAPUh

Upvotes: 1

mwalter
mwalter

Reputation: 132

You can split the keyword into a array then later compare if the trademark is inside of the splited array.

select 
    w.id, w.keyword, t.trademark 
from
    words w
inner join 
    trademarks t 
on
  t.trademark = ANY( string_to_array(REPLACE(w.keyword , ' ' , '~^~' ),'~^~' , '') )
where 
   w.keyword = 'all';

trademark and keywords overlaps each other example:

SELECT
ARRAY['swam'] && string_to_array(REPLACE('while swam is interesting',' ','~^~'), '~^~' ,'');

keywords is inside of trademark example

SELECT
ARRAY['swam'] @> string_to_array(REPLACE('while swam is interesting',' ','~^~'), '~^~' ,'');

Upvotes: 0

Related Questions