Reputation: 8122
We have
has_and_belongs_to_many :questions #on game.rb
has_and_belongs_to_many :games # on question.rb
I have to find all questions which are NOT assigned to games. So, i am doing
t = []
Question.includes(:games).all.each { |q| t << q.id if !q.games.present? }
puts t.size
but we have 30,000+ records so it is taking too much time to process the above query. How to deal such situation ? Is there anyway to optimize the above query so my server may not go out-of-memory or some other disaster .
Thanks
Upvotes: 1
Views: 93
Reputation: 52356
If it's safe to assume that there is no row in the join table for questions not assigned to games, then:
t = Question.where("not exists (select null from games_questions gq where gq.question_id = questions.id)").count
If you need the actual question objects then omit the count, of course.
You query would be something like:
select count(*)
from questions
where not exists (
select null
from games_questions gq
where gq.question_id = questions.id);
It counts all of the rows in the questions table for which there is no row returned from the correlated subquery (hence it does not matter what is put in the select clause for the correlated subquery, and I generally use NULL).
So for question where id = 5, the row is only returned where no row in the games_questions table can be found with question_id = 5.
Although a naive query optimiser might implement this as a full table scan of questions and an execution of the subquery for every row, more sophisticated optimisers will recognise this as an anti-join and implement is more efficiently. In Oracle it's likely to be a hash anti-join, although this would depend on the number of rows in each table. If questions had 10 rows and games_questions had 1,000,000 and games_questions.question_id was indexed, you'd be likely to see the more naive execution plan using a nested loop to probe the games_questions table for each row in questions.
Upvotes: 1