migu
migu

Reputation: 4336

Postgres: Inner Join with AND condition on same field

One Quiz can have many Submissions. I want to fetch all Quizzes that have at least one associated Submission with submissions.correct = t and at least one associated Submission with submissions.correct = f.

How can I fix the following query and the WHERE statement in particular to make that happen:

SELECT quizzes.*,
       Count(submissions.id) AS submissions_count
FROM   "quizzes"
       INNER JOIN "submissions"
               ON "submissions"."quiz_id" = "quizzes"."id"
WHERE  ( submissions.correct = 'f' )
       AND ( submissions.correct = 't' )
GROUP  BY quizzes.id
ORDER  BY submissions_count ASC

Update:

Here is the missing information:

I need all row data from Quizzes. I only need the count for ordering within the query (the quizzes with the least amount of submissions first).

k-voc_development=# \d quizzes;
                                         Table "public.quizzes"
       Column   |            Type             |                      Modifiers                       
    ------------+-----------------------------+------------------------------------------------------
     id         | integer                     | not null default nextval('quizzes_id_seq'::regclass)
     question   | character varying           | not null
     created_at | timestamp without time zone | not null
     updated_at | timestamp without time zone | not null
    Indexes:
        "quizzes_pkey" PRIMARY KEY, btree (id)
    Referenced by:
        TABLE "submissions" CONSTRAINT "fk_rails_04e433a811" FOREIGN KEY (quiz_id) REFERENCES quizzes(id)
        TABLE "answers" CONSTRAINT "fk_rails_431b8a33a3" FOREIGN KEY (quiz_id) REFERENCES quizzes(id)

    k-voc_development=# \d submissions;
                                         Table "public.submissions"
       Column   |            Type             |                        Modifiers                         
    ------------+-----------------------------+----------------------------------------------------------
     id         | integer                     | not null default nextval('submissions_id_seq'::regclass)
     quiz_id    | integer                     | not null
     correct    | boolean                     | not null
     created_at | timestamp without time zone | not null
     updated_at | timestamp without time zone | not null
    Indexes:
        "submissions_pkey" PRIMARY KEY, btree (id)
        "index_submissions_on_quiz_id" btree (quiz_id)
    Foreign-key constraints:
        "fk_rails_04e433a811" FOREIGN KEY (quiz_id) REFERENCES quizzes(id)

    k-voc_development=# 

Upvotes: 2

Views: 1911

Answers (4)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658092

The best solution depends on the details of your implementation, data distribution and requirements.

If you have a typical installation with referential integrity (FK constraint) and defined submissions.correct as boolean NOT NULL and just want the quiz_id along with a count of total submissions, then you don't need to join in quizzes at all and this should be fastest:

SELECT quiz_id, count(*) AS ct
FROM   submissions
-- WHERE  correct IS NOT NULL -- only relevant if correct can be NULL
GROUP  BY 1
HAVING bool_or(correct)
AND    bool_or(NOT correct);

The dedicated aggregate function bool_or() is particularly useful for your test with boolean values. Simpler and faster than a CASE expression or similar constructs.

There are many other techniques, the best solution depends on the missing information.

For your updated requirements

I need all row data from quizzes. I only need the count for ordering within the query (the quizzes with the least amount of submissions first).

This should be fastest if lots of quizzes qualify (high percentage of total).

SELECT q.*
FROM  (
   SELECT quiz_id, count(*) AS ct
   FROM   submissions
   GROUP  BY 1
   HAVING count(*) > count(correct OR NULL)
   ) s
JOIN   quizzes q ON q.id = s.quiz_id
ORDER  BY s.ct;

count(*) > count(correct OR NULL) works because correct is boolean NOT NULL. Should be slightly faster than the variant above for few submissions per quiz.

Upvotes: 2

wildplasser
wildplasser

Reputation: 44250

-- I want to fetch all Quizzes
SELECT * FROM quizzes q
WHERE EXISTS ( -- that have at least one associated Submission with submissions.correct = t
    SELECT * FROM submissions s
    WHERE s.quiz_id = q.id AND s.correct = 't'
    )
AND EXISTS ( -- and at least one associated Submission with submissions.correct = f.
    SELECT * FROM submissions s
    WHERE s.quiz_id = q.id AND s.correct = 'f'
    );

Upvotes: 2

Pரதீப்
Pரதீப்

Reputation: 93754

Move your where clause to Having Clause with Conditional Count aggregate

SELECT quizzes.*,
       Count(submissions.id) AS submissions_count
FROM   "quizzes"
       INNER JOIN "submissions"
               ON "submissions"."quiz_id" = "quizzes"."id"
GROUP  BY quizzes.id
HAVING Count(CASE WHEN submissions.correct = 'f' THEN 1 END) >= 1
        and Count(CASE WHEN submissions.correct = 't' THEN 1 END) >= 1
ORDER  BY submissions_count ASC 

Upvotes: 1

jarlh
jarlh

Reputation: 44796

If there are no other submissions.correct values than t and f, then this will work:

SELECT quizzes.*,
       Count(submissions.id) AS submissions_count
FROM   "quizzes"
       INNER JOIN "submissions"
               ON "submissions"."quiz_id" = "quizzes"."id"
GROUP  BY quizzes.id
HAVING COUNT(DISTINCT submissions.correct) >= 2
ORDER  BY submissions_count ASC 

Upvotes: 1

Related Questions