l0b0
l0b0

Reputation: 58768

Rails: How to get objects with at least one child?

After googling, browsing SO and reading, there doesn't seem to be a Rails-style way to efficiently get only those Parent objects which have at least one Child object (through a has_many :children relation). In plain SQL:

SELECT *
  FROM parents
 WHERE EXISTS (
               SELECT 1
                 FROM children
                WHERE parent_id = parents.id)

The closest I've come is

Parent.all.reject { |parent| parent.children.empty? }

(based on another answer), but it's really inefficient because it runs a separate query for each Parent.

Upvotes: 24

Views: 11991

Answers (6)

Fellow Stranger
Fellow Stranger

Reputation: 34013

As of Rails 5.1, uniq is deprecated and distinct should be used instead.

Parent.joins(:children).distinct

This is a follow-up on Chris Bailey's answer. .all is removed as well from the original answer as it doesn't add anything.

Upvotes: 14

Junichi Ito
Junichi Ito

Reputation: 2588

The accepted answer (Parent.joins(:children).uniq) generates SQL using DISTINCT but it can be slow query. For better performance, you should write SQL using EXISTS:

Parent.where<<-SQL
EXISTS (SELECT * FROM children c WHERE c.parent_id = parents.id)
SQL

EXISTS is much faster than DISTINCT. For example, here is a post model which has comments and likes:

class Post < ApplicationRecord
  has_many :comments
  has_many :likes
end

class Comment < ApplicationRecord
  belongs_to :post
end

class Like < ApplicationRecord
  belongs_to :post
end

In database there are 100 posts and each post has 50 comments and 50 likes. Only one post has no comments and likes:

# Create posts with comments and likes
100.times do |i|
  post = Post.create!(title: "Post #{i}")
  50.times do |j|
    post.comments.create!(content: "Comment #{j} for #{post.title}")
    post.likes.create!(user_name: "User #{j} for #{post.title}")
  end
end

# Create a post without comment and like
Post.create!(title: 'Hidden post')

If you want to get posts which have at least one comment and like, you might write like this:

# NOTE: uniq method will be removed in Rails 5.1
Post.joins(:comments, :likes).distinct

The query above generates SQL like this:

SELECT DISTINCT "posts".* 
FROM "posts" 
INNER JOIN "comments" ON "comments"."post_id" = "posts"."id" 
INNER JOIN "likes" ON "likes"."post_id" = "posts"."id"

But this SQL generates 250000 rows(100 posts * 50 comments * 50 likes) and then filters out duplicated rows, so it could be slow.

In this case you should write like this:

Post.where <<-SQL
EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id)
AND
EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
SQL

This query generates SQL like this:

SELECT "posts".* 
FROM "posts" 
WHERE (
EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id) 
AND 
EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
)

This query does not generate useless duplicated rows, so it could be faster.

Here is benchmark:

              user     system      total        real
Uniq:     0.010000   0.000000   0.010000 (  0.074396)
Exists:   0.000000   0.000000   0.000000 (  0.003711)

It shows EXISTS is 20.047661 times faster than DISTINCT.

I pushed the sample application in GitHub, so you can confirm the difference by yourself:

https://github.com/JunichiIto/exists-query-sandbox

Upvotes: 6

Chris Bailey
Chris Bailey

Reputation: 4136

Parent.joins(:children).uniq.all

Upvotes: 58

simonmenke
simonmenke

Reputation: 2880

try including the children with #includes()

Parent.includes(:children).all.reject { |parent| parent.children.empty? }

This will make 2 queries:

SELECT * FROM parents;
SELECT * FROM children WHERE parent_id IN (5, 6, 8, ...);

[UPDATE]

The above solution is usefull when you need to have the Child objects loaded. But children.empty? can also use a counter cache1,2 to determine the amount of children.

For this to work you need to add a new column to the parents table:

# a new migration
def up
  change_table :parents do |t|
    t.integer :children_count, :default => 0
  end

  Parent.reset_column_information
  Parent.all.each do |p|
    Parent.update_counters p.id, :children_count => p.children.length
  end
end

def down
  change_table :parents do |t|
    t.remove :children_count
  end
end

Now change your Child model:

class Child
  belongs_to :parent, :counter_cache => true
end

At this point you can use size and empty? without touching the children table:

Parent.all.reject { |parent| parent.children.empty? }

Note that length doesn't use the counter cache whereas size and empty? do.

Upvotes: 1

bradgonesurfing
bradgonesurfing

Reputation: 32162

You just want an inner join with a distinct qualifier

SELECT DISTINCT(*) 
FROM parents
JOIN children
ON children.parent_id = parents.id

This can be done in standard active record as

Parent.joins(:children).uniq

However if you want the more complex result of find all parents with no children you need an outer join

Parent.joins("LEFT OUTER JOIN children on children.parent_id = parent.id").
where(:children => { :id => nil })

which is a solution which sux for many reasons. I recommend Ernie Millers squeel library which will allow you to do

Parent.joins{children.outer}.where{children.id == nil}

Upvotes: 2

Soundar Rathinasamy
Soundar Rathinasamy

Reputation: 6728

I have just modified this solution for your need.

Parent.joins("left join childrens on childrends.parent_id = parents.id").where("childrents.parent_id is not null")

Upvotes: 3

Related Questions