Rahul Tapali
Rahul Tapali

Reputation: 10137

Batch processing in Rails

Rails query:

  Detail.created_at_gt(15.days.ago.to_datetime).find_each do |d|
      //Some code
  end

Equivalent mysql query:

  SELECT * FROM `details` WHERE (details.id >= 0) AND
                 (details.created_at > '2012-07-01 12:22:32')
                  ORDER BY details.id ASC LIMIT 1000

By using find_each in rails it is checking for details.id >= 0 and ordering details in in ascending order.

Here, I want to avoid those two actions because in my case it is scanning whole table when I have large data to process (i.e) indexing on created_at fails. So this is inefficient to do. Please anyone help.

Upvotes: 0

Views: 3294

Answers (3)

rogal111
rogal111

Reputation: 5933

Here you've source of find_in_batches used in find_each:

http://apidock.com/rails/ActiveRecord/Batches/find_in_batches

Click Show source link. Essential lines are:

relation = relation.reorder(batch_order).limit(batch_size)
records = relation.where(table[primary_key].gteq(start)).all

and

records = relation.where(table[primary_key].gt(primary_key_offset)).to_a

You must order records by primary index or other unique index to process in batches and to select next batches. You can't do batches by created_at because it is not unique. But you could mix ordering by created_at and selecting by unique id:

relation = relation.reorder('created_at ASC, id ASC').limit(batch_size)
records = relation.where(table[primary_key].gteq(start)).all

#....

while records.any?
    records_size = records.size
    primary_key_offset = records.last.id
    created_at_key = records.last.created_at

    yield records

    break if records_size < batch_size

    if primary_key_offset
      records = relation.where('created_at>:ca OR (created_at=:ca AND id>:id)',:ca=>created_at_key,:id=>primary_key_offset).to_a
    else
      raise "Primary key not included in the custom select clause"
    end
end

If you are absolutely sure that no record, with the same created_at value, will be repeated more than bach_size times you could just use created_at as only key in batch processing.

Anyway you need index on created_at to be efficient.

Upvotes: 2

egoholic
egoholic

Reputation: 317

Be better if you will use scopes and ARel style of quering:

class Detail < ActiveRecord::Base
  table = self.arel_table

  scope :created_after, lambda { |date| where(table[:created_at].gt(date)).limit(1000) }
end

Than you can find 1000 records that was created after some date:

@details = Detail.created_after(15.days.ago.to_date_time)

Upvotes: 0

Jason Kim
Jason Kim

Reputation: 19031

Detail.where('created_at > ? AND id < ?', 15.days.ago.to_datetime, 1000).order('details.id ASC')

You don't have to explicitly check details.id >= 0 as Rails does it for you by default.

Upvotes: 0

Related Questions