Reputation: 7456
Currently, my code reads like this:
current_user.association.includes(a: [:b, {c: :d}, {e: :f}]).to_a
When doing a call, it seems every single includes is called through its own SELECT
call to the DB.
However, when I do current_user.association.eager_load(a: [:b, {c: :d}, {e: :f}]).to_a
I see one huge SELECT
call.
I ask because I haven't seen this raised before. I would assume that the eager_load
is more efficient due to less DB calls.
Upvotes: 1
Views: 980
Reputation: 6674
As I can't infer the query from your description (a: [:b, {c: :d}, {e: :f}])
, I need to talk about includes
for a little bit.
includes
is a query method which accommodates in different situations.
Here are some example code:
# model and reference
class Blog < ActiveRecord::Base
has_many :posts
# t.string "name"
# t.string "author"
end
class Post < ActiveRecord::Base
belongs_to :blog
# t.string "title"
end
# seed
(1..3).each do |b_id|
blog = Blog.create(name: "Blog #{b_id}", author: 'someone')
(1..5).each { |p_id| blog.posts.create(title: "Post #{b_id}-#{p_id}") }
end
In one case, it fires two separate queries, just like preload
.
> Blog.includes(:posts)
Blog Load (2.8ms) SELECT "blogs".* FROM "blogs"
Post Load (0.7ms) SELECT "posts".* FROM "posts" WHERE "posts"."blog_id" IN (1, 2, 3)
In another case, when querying on the referenced table, it fires only one LEFT OUTER JOIN
query, just like eager_load
.
> Blog.includes(:posts).where(posts: {title: 'Post 1-1'})
SQL (0.3ms) SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id" WHERE "posts"."title" = ? [["title", "Post 1-1"]]
So, I think you may asking for the different part of includes
and eager_load
, which is
Should we use two separate queries or one LEFT OUTER JOIN
query for the sake of efficiency and optimisation?
This also confuses me. After some digging, I've found this article by Fabio Akita convinced me. Here are some references and example:
For some situations, the monster outer join becomes slower than many smaller queries. The bottom line is: generally it seems better to split a monster join into smaller ones. This avoid the cartesian product overload problem.
The longer and more complex the result set, the more this matters because the more objects Rails would have to deal with. Allocating and deallocating several hundreds or thousands of small duplicated objects is never a good deal.
Example for query data from Rails
> Blog.eager_load(:posts).map(&:name).count
SQL (0.9ms) SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id"
=> 3
Example for SQL data returned from LEFT OUTER JOIN
query
sqlite> SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id";
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|1|Post 1-1|2015-11-11 15:22:35.053689|2015-11-11 15:22:35.053689|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|2|Post 1-2|2015-11-11 15:22:35.058113|2015-11-11 15:22:35.058113|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|3|Post 1-3|2015-11-11 15:22:35.062776|2015-11-11 15:22:35.062776|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|4|Post 1-4|2015-11-11 15:22:35.065994|2015-11-11 15:22:35.065994|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|5|Post 1-5|2015-11-11 15:22:35.069632|2015-11-11 15:22:35.069632|1
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|6|Post 2-1|2015-11-11 15:22:35.078644|2015-11-11 15:22:35.078644|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|7|Post 2-2|2015-11-11 15:22:35.081845|2015-11-11 15:22:35.081845|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|8|Post 2-3|2015-11-11 15:22:35.084888|2015-11-11 15:22:35.084888|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|9|Post 2-4|2015-11-11 15:22:35.087778|2015-11-11 15:22:35.087778|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|10|Post 2-5|2015-11-11 15:22:35.090781|2015-11-11 15:22:35.090781|2
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|11|Post 3-1|2015-11-11 15:22:35.097479|2015-11-11 15:22:35.097479|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|12|Post 3-2|2015-11-11 15:22:35.103512|2015-11-11 15:22:35.103512|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|13|Post 3-3|2015-11-11 15:22:35.108775|2015-11-11 15:22:35.108775|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|14|Post 3-4|2015-11-11 15:22:35.112654|2015-11-11 15:22:35.112654|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|15|Post 3-5|2015-11-11 15:22:35.117601|2015-11-11 15:22:35.117601|3
We got the expected result from Rails, but bigger result from SQL. And that's the efficiency lose for the LEFT OUTER JOIN
.
So my conclusion is, prefer includes
over eager_load
.
I've concluded a blog post about Preload, Eager_load, Includes, References, and Joins in Rails while researching. Hope this can help.
Reference
Upvotes: 4
Reputation: 7456
So, as it turns out, at one point ActiveRecord actually attempted to get everything into one query, but then opted it wasn't such a good idea.
I explored this with my query above and 4000 records.
A quick analysis:
eager_load took 2,600 milliseconds. includes took 72 milliseconds.
eager_load took 36 times as long as includes.
Upvotes: 1