KindFrog
KindFrog

Reputation: 378

Select rows that do not contain a word from another table

I have a table with one word each row and a table with some text in a row. I need to select from the second table only those rows that does not contain words from the first table.

For example:

Table with constratint words

constraint_word
example
apple
orange
mushroom
car
qwerty

Table with text

text
word1. apple; word3, example
word1, apple, word2. car
word1 word2 orange word3
mushroomword1 word2 word3
word1 car
qwerty

Nothing should be selected in this case, because every row in the second table contains words from the first table.

I only have an idea to use CROSS JOIN to achive this

SELECT DISTINCT text FROM text_table CROSS JOIN words_table
           WHERE CONTAINS(text, constraint_word ) = 0

Is there a way to do it without using CROSS JOIN?

Upvotes: 0

Views: 1275

Answers (3)

hc_dev
hc_dev

Reputation: 9377

Your idea is fine, since you need to test all words for each text. This is what CROSS JOIN does - a combination (cartesian product).

We can even be more restrictive for better performance and use INNER JOIN, or the shorthand JOIN.

See also: CROSS JOIN vs INNER JOIN in SQL

Additionally you need to filter all text records, where there are no matches at all. This means the count of non-matches over all combinations per text is maximum (= number of constraint_words, here 6). This filter can be done using GROUP BY WITH HAVING

-- text without any constaint_word
SELECT t.text, count(*)
FROM text_table t
JOIN words_table w ON CONTAINS(t.text, w.constraint_word, 1) = 0
GROUP BY t.text
HAVING count(*) = (SELECT count(*) FROM words_table)
;

It will output:

text count(*)
mushroomword1 word2 word3 6

Try the demo on on SQL Fiddle

Entire-word vs partial matches

Note that 'mushroom' from constraint words is not matched by CONTAINS because it is contained as word-part not as entire word.

For partial-matches you can use INSTR as answered by Littlefoot.

See also

Upvotes: 1

Littlefoot
Littlefoot

Reputation: 142705

contains means Oracle Text; cross join means Cartesian product (usually performance nightmare).

One option which avoids both of these is instr function (which checks existence of the constraint_word in text, but this time using inner join) and the minus set operator.

Something like this, using sample data you posted:

SQL> select * from text_table;

TEXT
---------------------------
word1.apple; word3, example
word1, apple, word2.car
word1 word2 orange word3
mushroomword1 word2 word3
word1 car
qwerty

6 rows selected.

SQL> select * From words_table;

CONSTRAI
--------
example
apple
orange
mushroom
car
qwerty

6 rows selected.

SQL>

As you said, initially query shouldn't return anything because all constraint_words exist in text:

SQL> select c.text
  2  from text_table c
  3  minus
  4  select b.text
  5  from words_table a join text_table b on instr(b.text, a.constraint_word) > 0;

no rows selected

Let's modify one of text rows:

SQL> update text_table set text = 'xxx' where text = 'qwerty';

1 row updated.

What's the result now?

SQL> select c.text
  2  from text_table c
  3  minus
  4  select b.text
  5  from words_table a join text_table b on instr(b.text, a.constraint_word) > 0;

TEXT
---------------------------
xxx

SQL>

Right; text we've just modified.

Upvotes: 1

esramish
esramish

Reputation: 327

I believe this works (I think the issue with the CROSS JOIN route is that it includes any texts that don't contain at least one of the words--not just texts that don't contain any):

SELECT DISTINCT text FROM text_table WHERE (SELECT COUNT(*) FROM words_table WHERE CONTAINS(text, constraint_word)) = 0;

Upvotes: 0

Related Questions