has_and_belongs_to_many where other record does not exist

Question

We have

has_and_belongs_to_many :questions #on game.rb
has_and_belongs_to_many :games # on question.rb

I have to find all questions which are NOT assigned to games. So, i am doing

t = []
Question.includes(:games).all.each { |q| t <<  q.id  if !q.games.present? }
puts t.size

but we have 30,000+ records so it is taking too much time to process the above query. How to deal such situation ? Is there anyway to optimize the above query so my server may not go out-of-memory or some other disaster .
Thanks

David Aldridge · Accepted Answer

If it's safe to assume that there is no row in the join table for questions not assigned to games, then:

t = Question.where("not exists (select null from games_questions gq where gq.question_id = questions.id)").count

If you need the actual question objects then omit the count, of course.

You query would be something like:

select count(*)
from   questions
where  not exists (
         select null
         from   games_questions gq
         where  gq.question_id = questions.id);

It counts all of the rows in the questions table for which there is no row returned from the correlated subquery (hence it does not matter what is put in the select clause for the correlated subquery, and I generally use NULL).

So for question where id = 5, the row is only returned where no row in the games_questions table can be found with question_id = 5.

Although a naive query optimiser might implement this as a full table scan of questions and an execution of the subquery for every row, more sophisticated optimisers will recognise this as an anti-join and implement is more efficiently. In Oracle it's likely to be a hash anti-join, although this would depend on the number of rows in each table. If questions had 10 rows and games_questions had 1,000,000 and games_questions.question_id was indexed, you'd be likely to see the more naive execution plan using a nested loop to probe the games_questions table for each row in questions.

has_and_belongs_to_many where other record does not exist

Answers (2)

Related Questions