Schwern
Schwern

Reputation: 164769

How do I get has_and_belongs_to_many to do a bulk insert?

I have two classes with a has_and_belongs_to_many relationship.

class Customer < ApplicationRecord
  has_and_belongs_to_many :segments

  def rematch_segments
    self.segments = Segment.customer_segments(self)
  end
end

class Segment < ApplicationRecord
  has_and_belongs_to_many :customers

  class << self
    def customer_segments(customer)
      ...returns a collection of Segments...
    end
  end
end

Calling rematch_segments results in an insert for each segment.

(0.2ms)  BEGIN
Customer::HABTM_Segments Create (0.8ms)  INSERT INTO "customers_segments" ("customer_id", "segment_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4)  [["customer_id", 1], ["segment_id", 1], ["created_at", "2020-03-24 23:42:52.985400"], ["updated_at", "2020-03-24 23:42:52.985400"]]
Customer::HABTM_Segments Create (0.2ms)  INSERT INTO "customers_segments" ("customer_id", "segment_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4)  [["customer_id", 1], ["segment_id", 2], ["created_at", "2020-03-24 23:42:52.987537"], ["updated_at", "2020-03-24 23:42:52.987537"]]
Customer::HABTM_Segments Create (0.2ms)  INSERT INTO "customers_segments" ("customer_id", "segment_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4)  [["customer_id", 1], ["segment_id", 3], ["created_at", "2020-03-24 23:42:52.988610"], ["updated_at", "2020-03-24 23:42:52.988610"]]
(0.3ms)  COMMIT

How can I instead do a single bulk insert?

Upvotes: 2

Views: 2221

Answers (3)

Sunny
Sunny

Reputation: 6063

You can make the association visible by creating a class for your join table:

class Customer < ApplicationRecord
  has_many :segments, through: :customer_segments
  has_many :customer_segments
end

class Segment < ApplicationRecord
  has_many :customers, through: :customer_segments
  has_many :customer_segments
end

class CustomerSegment < ApplicationRecord
  belongs_to :customer
  belongs_to :segment
end

This way you can use Rails 6's insert_all() (or the activerecord-import gem) to bulk insert the join table directly:

CustomerSegment.insert_all([
  { customer_id: 1, segment_id: 1 },
  { customer_id: 1, segment_id: 2 },
  …
])

Upvotes: 4

asimhashmi
asimhashmi

Reputation: 4378

Rails 6 now supports bulk insert feature which can be used to insert multiple records in database in a single query. There is a insert_all method which accepts an array of hash representing each row to be inserted in the database. By default it skips duplicate rows

Inserts multiple records into the database in a single SQL INSERT statement. It does not instantiate any models nor does it trigger Active Record callbacks or validations. Though passed values go through Active Record's type casting and serialization.

The attributes parameter is an Array of Hashes. Every Hash determines the attributes for a single row and must have the same keys.

Rows are considered to be unique by every unique index on the table. Any duplicate rows are skipped. Override with :unique_by (see below).

Assuming you've a CustomerSegment class representing your join model.

You can do something like this:

CustomerSegment.insert_all([{ customer_id: 54, segment_id: 2 },{ customer_id: 143, segment_id: 222 }...])

Upvotes: 2

John
John

Reputation: 3535

If you are focusing on performance, I would recommend using raw SQL rather than ORM which can have performance drawbacks relatively.

To simplify code, ['created_at', 'updated_at'] were removed.

1. Using raw query

You may consider building INSERT QUERY like the following.

INSERT INTO customers_segments (customer_id,segment_id) VALUES (1,1),(1,2),(1,3) ...

According to records size, you may need to set batch size. Usually, it should be fine to manage thousands records without batch size.

segments = Segment.find_by_attr(:attr_val) #please do not forget indexing of the attr in DB.
customer = Customer.create
stmt = segments.map {|seg| "(#{customer.id},#{seg.id})"}.join(",")

ActiveRecord::Base.connection.execute("INSERT INTO customers_segments (customer_id, segment_id) VALUES #{stmt}")

2. implement raw query with ORM

class Customer < ApplicationRecord
  ...
  def batch_rematch(attr_array)
    attr_val = attr_array.map{|a| %Q{'#{a}'} }.uniq.join(",")
    self.connection.execute(%Q{insert into customers_segments (customer_id,segment_id) 
                     select distinct customers.id,segments.id 
                     from customers,segments
                     where customers.id = #{self.id} 
                     and segments.<attr> in (#{attr_val})})
  end
end

Upvotes: 0

Related Questions