artful dodger
artful dodger

Reputation: 77

selecting based on all conditions being met (relational division)

I have the tables question, topic and question_has_topic (many-to-many relationship). In my application admins see a breakdown for questions grouped by their topics and they select how many from each they'd like the system to randomly select to create a test.
This is the kind of table they see:

+-----------------------+---------------------+------------+
|        Topics         | Questions available | Selection: | 
+-----------------------+---------------------+------------+
| health,safety,general |                  13 |            |
| health                |                   3 |            |
| safety                |                   7 |            |
| general               |                   1 |            |
+-----------------------+---------------------+------------+

The count is unique for the particular grouping of topics. Anyway, once they make the selection I need a SQL statement which will select questions that correspond to the given grouping of topics. I.e. I might need 3 questions which have the topics health,safety and general.
I was doing some research online and I think that what I'm trying to do is known as divide in relational algebra and here is my attempt for an arbitrary grouping of topicids:

select questionid from question_has_topic
where not exists (
    select questionid from question_has_topic
    where topicid not in (8,9,10))

The result is empty, although there are 2 questions in the database that have all these topic ids which tells me this isn't working. I was following the example from this link

Upvotes: 1

Views: 388

Answers (2)

Cheran Shunmugavel
Cheran Shunmugavel

Reputation: 8459

EDIT: removed my old post since I misread the question.


This is a technique I've used in the past:

  SELECT qht.questionid
    FROM question_has_topic AS qht
   WHERE qht.topicid IN (8,9,10)
GROUP BY qht.questionid
  HAVING COUNT(*) = 3 AND
         COUNT(*) = (SELECT COUNT(*) FROM question_has_topic AS dupe
                     WHERE dupe.questionid = qht.questionid)

where 3 corresponds to the number of topics in the given group. This assumes that each (questionid, topicid) pair in question_has_topic is unique (which it should be in a many-to-many relationship table).

The way this query works is by first selecting any question that has at least one of the desired topics assigned to it (WHERE qht.topicid IN (8,9,10)), then grouping by the questionid. The first HAVING clause (COUNT(*) = 3) can only be true if a given question has all three topics assigned to it (since we assume duplicates aren't allowed in this table). The second HAVING clause checks the total number of topics that are assigned to the question. This is to guard against the case where, for example, a question may have topics 8, 9, 10, and 11 assigned to it.

Upvotes: 2

user1191247
user1191247

Reputation: 12998

I think this is what you were trying to write but it is a very inefficient way of doing it -

SELECT questionid FROM question WHERE NOT EXISTS (
    SELECT topicid FROM topic WHERE topicid NOT IN (
        SELECT topicid FROM question_has_topic WHERE question.questionid = question_has_topic.questionid
    ) AND topicid IN (8, 9, 10)
);

This is definitely much faster -

SELECT *
FROM question_has_topic t1
INNER JOIN question_has_topic t2
    ON t1.questionid = t2.questionid AND t2.topicid = 9
INNER JOIN question_has_topic t3
    ON t2.questionid = t3.questionid AND t3.topicid = 10
WHERE t1.topicid = 8;

UPDATE: I knew there was a simpler answer. Cheran's method is much simpler and should run slightly faster than the INNER JOINs. Please accept his answer.

Upvotes: 2

Related Questions