Reputation: 999
In order to make a single DB query, I am eager loading Posts along with their translation data (using Rails 6 and Mobility (*)), but instead it yields 2 SQL queries:
# app/models/post.rb
class Post < ApplicationRecord
extend Mobility
translates :title, :description, backend: :table
end
# app/controllers/posts_controller.rb
class PostsController < ApplicationRecord
def index
@posts = Post.eager_load(:translations).limit(3)
end
end
<%# app/views/posts/index.html.erb %>
<% @posts.each do |post| %>
<h1><%= post.title %></h1>
<div><%= post.description %></div>
<% end %>
Result:
SELECT DISTINCT "posts"."id" FROM "posts"
LEFT OUTER JOIN "post_translations"
ON "post_translations"."post_id" = "posts"."id" LIMIT $1 [["LIMIT", 3]]
SELECT "posts"."id" AS t0_r0, "posts"."created_at" AS t0_r1, "posts"."updated_at" AS t0_r2, "post_translations"."id" AS t1_r0, "post_translations"."locale" AS t1_r1, "post_translations"."created_at" AS t1_r2, "post_translations"."updated_at" AS t1_r3, "post_translations"."title" AS t1_r4, "post_translations"."description" AS t1_r5, "post_translations"."post_id" AS t1_r6 FROM "posts"
LEFT OUTER JOIN "post_translations"
ON "post_translations"."post_id" = "posts"."id"
WHERE "posts"."id" IN ($1, $2, $3) [["id", "00060a7d-b846-5fc5-a372-1fc3462c695c"], ["id", "008db504-6fb4-5e90-bdca-4293ebe6d920"], ["id", "034944c1-4067-5ae5-89aa-4777ef14d66b"]]
How can this double SQL statement with in-memory of IDs be avoided?
(*) A note mon Mobility
The Mobility documentation has examples yielding a single SQL statement, but as pointed out by Chris Salzberg, its query API is not used at all in this example so should not be the culprit. To try to demonstrate that the issue might not be related to Mobility but Active Record itself, below is a somewhat equivalent code stripped off of Mobility, which shows the same double-querying issue (NB: this is just for demonstration purposes, as I do want to keep using Mobility):
class Post < ApplicationRecord
has_many :translations, ->{ where(locale: I18n.locale) }
%i(title description).each do |attr|
define_method(attr) do
translations.first.send(attr)
end
end
class Translation < ApplicationRecord; end
end
<%# app/views/posts/index.html.erb %>
<% Post.eager_load(:translations).limit(3).each do |post| %>
<h1><%= post.title %></h1>
<div><%= post.description %></div>
<% end %>
Upvotes: 0
Views: 1007
Reputation: 599
The main reason for two queries is the LIMIT + DISTINCT and performance optimization done by rails under the hood.
You'll not like the consequences if somehow you'll force rails to run this in single query and keep the LIMIT.
Because of the DISTINCT "posts"."id"
, rails manages to avoid
DISTINCT "posts"."*"
. Having the last one condition will force DB to compare records in full, instead of just the primary keys!
And I'm 100% sure that in case:
SELECT DISTINCT "posts"."id" FROM "posts"
LEFT OUTER JOIN "post_translations"
ON "post_translations"."post_id" = "posts"."id" LIMIT $1 [["LIMIT", 3]]
under the hood, this request happens to be Index Only Scan, unless of course you skipped corresponding indexes.
Here the difference between such distincts, on a small size dataset:
4-5 times faster!
Ths is how the request look like whenever you dont need the DISTINCT
:
Upvotes: 1
Reputation: 102250
This is not a Mobility issue. It's a basic SQL issue. Let's have a look at that very dense query used to fetch the translations:
SELECT "posts"."id" AS t0_r0,
"posts"."created_at" AS t0_r1,
"posts"."updated_at" AS t0_r2,
"post_translations"."id" AS t1_r0,
"post_translations"."locale" AS t1_r1,
"post_translations"."created_at" AS t1_r2,
"post_translations"."updated_at" AS t1_r3,
"post_translations"."title" AS t1_r4,
"post_translations"."description" AS t1_r5,
"post_translations"."post_id" AS t1_r6
FROM "posts"
LEFT OUTER JOIN "post_translations"
ON "post_translations"."post_id" = "posts"."id"
WHERE "posts"."id" IN ($1,
$2,
$3) [["id", "00060a7d-b846-5fc5-a372-1fc3462c695c"],["id","008db504-6fb4-5e90-bdca-4293ebe6d920"]],
While it's a bit daunting with the aliases what is actually going on here is that Rails is fetching every combination of post and post_translations and what you get in the result is something like:
posts.id | posts.created_at | post_translations.id ...
-------------------------------------------------------
1 | 2025-2-16 | 91
1 | 2025-2-16 | 92
1 | 2025-2-16 | 94
1 | 2025-2-16 | 98
2 | 2025-2-16 | 123
6 | 2025-2-16 | 136
6 | 2025-2-16 | 136
It's one row for every combination of posts and post_translations and the stuff from posts is repeated on every row. ActiveRecord then bridges the object relational mismatch by looping through the rows and assigning the Post::Translation instances to the Post instances.
If you applied a LIMIT clause to this query it won't just apply to the posts. It applies to the combinations of both. So what you actually get may be:
posts.id | posts.created_at | post_translations.id ...
-------------------------------------------------------
1 | 2025-2-16 | 91
1 | 2025-2-16 | 92
1 | 2025-2-16 | 94
Or:
posts.id | posts.created_at | post_translations.id ...
-------------------------------------------------------
1 | 2025-2-16 | 91
2 | 2025-2-16 | 5867
9 | 2025-2-16 | 5467
ActiveRecord is actually being smart enough to detect that you're using eager loading together with a limit and splits it into two queries so that you get a result that makes sense.
If you really want to shoehorn this into a single query you have to use other ways of restricting the results like for example a subquery:
# There are most likely more effienct ways to do this
# but they are not as polyglot
Post.eager_load(:translations).where(
id: Post.select(:id).limit(3) # limit only applies to the subquery
)
But I would profile it first to see if the juice is worth the sqeeze. And also get rid of that SQL query in the view.
Upvotes: 0
Reputation: 128
If incase you are trying to get some very specific attributes from your collection, then the CollectionProxy would provide you with single query. If none of the attributes(columns) are provided, it does a distinct query before doing the OUTER JOIN
query.
Honestly I haven't read through the whole implementation to confirm the reasoning behind it.
But let me show you one thing.
2.5.1 :037 > Post.eager_load(:translations).each do |post|
2.5.1 :038 > puts post.title
2.5.1 :039?> end
SQL (0.5ms) SELECT "posts"."id" AS t0_r0, "posts"."title" AS t0_r1, "posts"."created_at" AS t0_r2, "posts"."updated_at" AS t0_r3, "post_translations"."id" AS t1_r0, "post_translations"."title" AS t1_r1, "post_translations"."content" AS t1_r2, "post_translations"."locale" AS t1_r3, "post_translations"."post_id" AS t1_r4, "post_translations"."created_at" AS t1_r5, "post_translations"."updated_at" AS t1_r6 FROM "posts" LEFT OUTER JOIN "post_translations" ON "post_translations"."post_id" = "posts"."id"
title 1
title 2
title 3
title 4
In this above case, you can see the eager_load does what you are expecting. The similar case where you don't mention the needed attributes, I think when it lazy evaluates, it picks up a distinct
query in addition to the OUTER JOIN
query
2.5.1 :040 > Post.eager_load(:translations)
SQL (0.3ms) SELECT DISTINCT "posts"."id" FROM "posts" LEFT OUTER JOIN "post_translations" ON "post_translations"."post_id" = "posts"."id" LIMIT ? [["LIMIT", 11]]
SQL (0.5ms) SELECT "posts"."id" AS t0_r0, "posts"."title" AS t0_r1, "posts"."created_at" AS t0_r2, "posts"."updated_at" AS t0_r3, "post_translations"."id" AS t1_r0, "post_translations"."title" AS t1_r1, "post_translations"."content" AS t1_r2, "post_translations"."locale" AS t1_r3, "post_translations"."post_id" AS t1_r4, "post_translations"."created_at" AS t1_r5, "post_translations"."updated_at" AS t1_r6 FROM "posts" LEFT OUTER JOIN "post_translations" ON "post_translations"."post_id" = "posts"."id" WHERE "posts"."id" IN (?, ?, ?, ?) [["id", 1], ["id", 2], ["id", 3], ["id", 4]]
=> #<ActiveRecord::Relation [#<Post id: 1, title: nil, created_at: "2021-07-08 12:42:13", updated_at: "2021-07-09 15:32:48">, #<Post id: 2, title: nil, created_at: "2021-07-09 15:33:50", updated_at: "2021-07-09 15:33:50">, #<Post id: 3, title: nil, created_at: "2021-07-09 15:33:55", updated_at: "2021-07-09 15:33:55">, #<Post id: 4, title: nil, created_at: "2021-07-09 15:33:57", updated_at: "2021-07-09 15:33:57">]>
Hope this is someway helpful. Please post if any comments are there, so I can clarify if I could. :)
Upvotes: 0