Mike Campbell
Mike Campbell

Reputation: 7978

Avoiding duplicate joins in Rails ActiveRecord query

I have a scenario where I have SQL joins somewhere in the query chain, and then at some further point I need to append a condition which needs the same join, but I don't know at this point whether that join exists already in the scope. For example:

@foo = Foo.joins("INNER JOIN foos_bars ON foos_bars.foo_id = foos.id")
....
@foo.joins(:bars).where(bars: { id: 1 })

This will product an SQL error about duplicate table/alias names.

The reason I write the SQL join manually in the first instance is to improve the efficiency as the classic rails AREL join will product two INNER JOINS where in my case I only need the one.

Is there a recommended way around this? Some way to inspect the joins currently in a scope for example.

Response to comment:

With a has_and_belongs_to_many relationship Rails produces two INNER JOINS like this:

SELECT "journals".* FROM "journals"
INNER JOIN "categories_journals"
  ON "categories_journals"."journal_id" = "journals"."id"
INNER JOIN "categories"
  ON "categories"."id" = "categories_journals"."category_id"
WHERE "categories"."id" = 1

Whereas I believe I can do this instead:

SELECT "journals".* FROM "journals"
INNER JOIN "categories_journals"
  ON "categories_journals"."journal_id" = "journals"."id"
WHERE "categories_journals"."category_id" = 1

Correct me if I'm wrong.

Upvotes: 1

Views: 2881

Answers (1)

Mike Campbell
Mike Campbell

Reputation: 7978

The solution was to universally use string joins. Unbeknownst to me Rails actually uniqs string joins -- so as long as they're string identical this problem doesn't occur.

This article put me on the scent, the author exhibits the exact same problem as me and patches Rails, and it looks like the patch was implemented a long time ago. I don't think it's perfect though. There should be a way for rails to handle hash parameter joins and string joins and not bomb out when they overlap. Might see if I can patch that ..

EDIT:

I did a couple of benchmarks to see if I was really worrying about nothing or not (between the two ways of joining):

1.9.3p194 :008 > time = Benchmark.realtime { 1000.times { a = Incident.joins("INNER JOIN categories_incidents ON categories_incidents.incident_id = incidents.id").where("categories_incidents.category_id = 1") } }
 => 0.042458 
1.9.3p194 :009 > time = Benchmark.realtime { 1000.times { a = Incident.joins(:categories).where(categories: { id: 1 }) } }
 => 0.152703

I'm not a regular benchmarker so my benchmarks may not be perfect but it looks to me as though my more efficient way does make real world performance improvements over large queries or lots of queries.

The downside of joining in the way I have done is that if a Category didn't exist but was still recorded in the join table then that might cause a few problems that would otherwise be avoidable with the more thorough join.

Upvotes: 1

Related Questions