shiftins
shiftins

Reputation: 97

Rails include returns filtered relations instead of all relations

I'm using an includes instead of a join because it runs faster but the statement is returning an association that doesn't include all of the data I'm looking for. It returns all of the left data, but only the right data that matches the query. Hopefully the examples below help clarify the problem and what I'm trying to achieve.

The join does seem to do what I'm after from a data and rails association perspective but executes a ton of queries and is much slower.

Setup and examples

class Species < ActiveRecord::Base
    has_many :species_types, foreign_key: 'species_id', primary_key: "id"

end

class SpeciesTypes < ActiveRecord::Base
    belongs_to :species, :foreign_key => "id", :primary_key => "species_id"

end

create_table "species", force: :cascade do |t|
    t.bigint "id"
    t.string "identifier"
end

create_table "species_types", force: :cascade do |t|
    t.bigint "species_id"
    t.bigint "type_id"
    t.string "name"
end

Table data to help visualize the queries below

Species

id identifier
1 furry
2 sleek
3 hairy
4 shiny
5 reflective
6 rough
7 rubbery

SpeciesTypes

species_id type_id identifier
1 1 hairy
1 2 metalic
2 3 skin
3 1 hairy
4 2 metalic
4 3 skin
5 3 skin
5 3 skin
6 2 metalic
7 2 metalic

I know the SpeciesTypes.type_id, and I'm looking to get all Species that have that type, including all of their SpeciesTypes.

Using includes

`species = Species.includes(:species_types).where(:species_types => {:type_id => 1})`

This does return all Species with a matching SpeciesType. However, instead of returning all Species with all SpeciesType it return all Species with only the SpeciesType that match the :type_id parameter. So, in this case you cannot reference all SpeciesTypes from the Species object (species[0].species_types). Does not return what was expected, although it makes sense why it does limit to the matched type_id.

Response from above query for Species
irb()$ species = Species.includes(:species_types).where(:species_types => {:type_id => 1})
irb()$ species[0].species_types

[#<SpeciesTypes:0x0000ffff9ad73490                       
 species_id: 1,
 type_id: 1,
 identifier: hairy>]
I'm looking for this:
irb()$ species = Species.includes(:species_types).where(:species_types => {:type_id => 1})
irb()$ species[0].species_types

[#<SpeciesTypes:0x0000ffff9ad73490                       
 species_id: 1,
 type_id: 1,
 identifier: hairy>,
 <SpeciesTypes:0x0000ffff9ad73490                       
 species_id: 1,
 type_id: 2,
 identifier: metalic>,
 ]

Using joins

This is returning what I'm after (using join instead of includes) however the query is much much slower. I think I'm missing something obvious (or not obvious but fundamental)

species = Species.joins(:species_types).where(:species_types => {:type_id => 3})

The above returns the values that I expect but is a much slower query.

Can the includes query be updated to return all Species with all types that match the known :type_id?

Upvotes: 0

Views: 356

Answers (1)

max
max

Reputation: 102036

While its pretty natural to think that Species.includes(:species_types).where(:species_types => {:type_id => 3}) would load all the species and just eager load the species_types that match the where clause thats not how ActiveRecord and SQL actually works.

What this generates in terms of a query something like:

SELECT species.name AS t0_c1, species_types.id AS t1_c1 ...
LEFT OUTER JOIN species_types, t1
  ON species_types.specie_id = species.id
WHERE species_types.type_id = ?

When you use includes and reference the other table it delegates to .eager_load which loads both tables in a single database query.

The where clause here applies to the entire query and not just the joined assocation. Remember that this query returns a row for every species_types row (with duplicate data for the species table).

If you wanted to load just the records that match the condition you would need to put the restriction into the JOIN clause:

SELECT species.name AS t0_c1, ...
LEFT OUTER JOIN species_types, t1
  ON species_types.specie_id = species.id AND species_types.type_id = ?

Unfortunately ActiveRecord associations do not provide a way to do that.

The easiest solution to the problem is most likely to just query from the other end:

Type.find(1)
    .specie_types.where(specie: specie)

.joins is not the answer

You can't just replace includes with joins as they do very things.

joins just adds an INNER LEFT JOIN to the query but doesn't actually select any columns from the joined table. Its used to filter the assocation based on the joined table or to select aggregates. Not to prevent N+1 queries.

In this case it's most likely not the first query itself thats slower - rather you're creating a N+1 query when you iterate through specie_types as the assocation is not eager loaded / preloaded.

includes does an OUTER LEFT JOIN and will load the assocatiated records either in one or two queries depending on how its used.

Upvotes: 1

Related Questions