John Bachir
John Bachir

Reputation: 22711

Equivalent of find_each for foo_ids?

Given this model:

class User < ActiveRecord::Base
  has_many :things
end

Then we can do this::

@user = User.find(123)
@user.things.find_each{ |t| print t.name }
@user.thing_ids.each{ |id| print id }

There are a large number of @user.things and I want to iterate through only their ids in batches, like with find_each. Is there a handy way to do this?

The goal is to:

Upvotes: 5

Views: 4628

Answers (5)

Halil &#214;zg&#252;r
Halil &#214;zg&#252;r

Reputation: 15945

Rails 5 introduced in_batches method, which yields a relation and uses pluck(primary_key) internally. And we can make use of the where_values_hash method of the relation in order to retrieve already-plucked ids:

@user.things.in_batches { |batch_rel| p batch_rel.where_values_hash['id'] }

Note that in_batches has order and limit restrictions similar to find_each.

This approach is a bit hacky since it depends on the internal implementation of in_batches and will fail if in_batches stops plucking ids in the future. A non-hacky method would be batch_rel.pluck(:id), but this runs the same pluck query twice.

Upvotes: 7

mkralla11
mkralla11

Reputation: 1299

UPDATE Final EDIT:

I have updated my answer after reviewing your updated question (not sure why you would downvote after I backed up my answer with source code to prove it...but I don't hold grudges :)

Here is my solution, tested and working, so you can accept this as the answer if it pleases you.

Below, I have extended ActiveRecord::Relation, overriding the find_in_batches method to accept one additional option, :relation. When set to true, it will return the activerecord relation to your block, so you can then use your desired method 'pluck' to get only the ids of the target query.

#put this file in your lib directory:
#active_record_extension.rb
module ARAExtension
  extend ActiveSupport::Concern

  def find_in_batches(options = {})
    options.assert_valid_keys(:start, :batch_size, :relation)

    relation = self
    start = options[:start]
    batch_size = options[:batch_size] || 1000

    unless block_given?
      return to_enum(:find_in_batches, options) do
        total = start ? where(table[primary_key].gteq(start)).size : size
        (total - 1).div(batch_size) + 1
      end
    end

    if logger && (arel.orders.present? || arel.taken.present?)
      logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
    end

    relation = relation.reorder(batch_order).limit(batch_size)
    records = start ? relation.where(table[primary_key].gteq(start)) : relation

    records = records.to_a unless options[:relation]

    while records.any?
      records_size = records.size
      primary_key_offset = records.last.id
      raise "Primary key not included in the custom select clause" unless primary_key_offset

      yield records

      break if records_size < batch_size

      records = relation.where(table[primary_key].gt(primary_key_offset))
      records = records.to_a unless options[:relation]
    end
  end

end

ActiveRecord::Relation.send(:include, ARAExtension)

here is the initializer

#put this file in config/initializers directory:
#extensions.rb
require "active_record_extension"

Originally, this method forced a conversion of the relation to an array of activrecord objects and returned it to you. Now, I optionally allow you to return the query before the conversion to the array happens. Here is an example of how to use it:

@user.things.find_in_batches(:batch_size=>10, :relation=>true).each do |batch_query|
  # do any kind of further querying/filtering/mapping that you want

  # show that this is actually an activerecord relation, not an array of AR objects
  puts batch_query.to_sql
  # add more conditions to this query, this is just an example
  batch_query = batch_query.where(:color=>"blue")
  # pluck just the ids
  puts batch_query.pluck(:id)
end

Ultimately, if you don't like any of the answers given on an SO post, you can roll-your-own solution. Consider only downvoting when an answer is either way off topic or not helpful in any way. We are all just trying to help. Downvoting an answer that has source code to prove it will only deter others from trying to help you.

Previous EDIT

In response to your comment (because my comment would not fit):

  1. calling thing_ids internally uses pluck
  2. pluck internally uses select_all
  3. ...which instantiates an activerecord Result

Previous 2nd EDIT:

This line of code within pluck returns an activerecord Result:

 ....
 result = klass.connection.select_all(relation.arel, nil, bound_attributes)
 ...

I just stepped through the source code for you. Using select_all will save you some memory, but in the end, an activerecord Result was still created and mapped over even when you are using the pluck method.

Upvotes: 0

smathy
smathy

Reputation: 27961

It is, unfortunately, not a one-liner or helper that will allow you to do this, so instead:

limit = 1000
offset = 0
loop do
  batch = @user.things.limit(limit).offset(offset).pluck(:id)
  batch.each { |id| puts id }
  break if batch.count < limit
  offset += limit
end

Upvotes: 0

JeffD23
JeffD23

Reputation: 9298

I would use something like this:

User.things.find_each(batch_size: 1000).map(&:id)

This will give you an array of the ids.

Upvotes: -1

AnkitG
AnkitG

Reputation: 6568

You can try something like below, the each slice will take 4 elements at a time and them you can loop around the 4

@user.thing_ids.each_slice(4) do |batch|
  batch.each do |id|
   puts id
   end
end

Upvotes: 0

Related Questions