Reputation: 22711
Given this model:
class User < ActiveRecord::Base
has_many :things
end
Then we can do this::
@user = User.find(123)
@user.things.find_each{ |t| print t.name }
@user.thing_ids.each{ |id| print id }
There are a large number of @user.things
and I want to iterate through only their ids in batches, like with find_each
. Is there a handy way to do this?
The goal is to:
thing_ids
array into memory at oncething_ids
, and not instantiate a Thing
for each idUpvotes: 5
Views: 4628
Reputation: 15945
Rails 5 introduced in_batches
method, which yields a relation and uses pluck(primary_key)
internally. And we can make use of the where_values_hash
method of the relation in order to retrieve already-plucked ids:
@user.things.in_batches { |batch_rel| p batch_rel.where_values_hash['id'] }
Note that in_batches
has order
and limit
restrictions similar to find_each
.
This approach is a bit hacky since it depends on the internal implementation of in_batches
and will fail if in_batches
stops plucking ids in the future. A non-hacky method would be batch_rel.pluck(:id)
, but this runs the same pluck query twice.
Upvotes: 7
Reputation: 1299
UPDATE Final EDIT:
I have updated my answer after reviewing your updated question (not sure why you would downvote after I backed up my answer with source code to prove it...but I don't hold grudges :)
Here is my solution, tested and working, so you can accept this as the answer if it pleases you.
Below, I have extended ActiveRecord::Relation, overriding the find_in_batches method to accept one additional option, :relation. When set to true, it will return the activerecord relation to your block, so you can then use your desired method 'pluck' to get only the ids of the target query.
#put this file in your lib directory:
#active_record_extension.rb
module ARAExtension
extend ActiveSupport::Concern
def find_in_batches(options = {})
options.assert_valid_keys(:start, :batch_size, :relation)
relation = self
start = options[:start]
batch_size = options[:batch_size] || 1000
unless block_given?
return to_enum(:find_in_batches, options) do
total = start ? where(table[primary_key].gteq(start)).size : size
(total - 1).div(batch_size) + 1
end
end
if logger && (arel.orders.present? || arel.taken.present?)
logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
end
relation = relation.reorder(batch_order).limit(batch_size)
records = start ? relation.where(table[primary_key].gteq(start)) : relation
records = records.to_a unless options[:relation]
while records.any?
records_size = records.size
primary_key_offset = records.last.id
raise "Primary key not included in the custom select clause" unless primary_key_offset
yield records
break if records_size < batch_size
records = relation.where(table[primary_key].gt(primary_key_offset))
records = records.to_a unless options[:relation]
end
end
end
ActiveRecord::Relation.send(:include, ARAExtension)
here is the initializer
#put this file in config/initializers directory:
#extensions.rb
require "active_record_extension"
Originally, this method forced a conversion of the relation to an array of activrecord objects and returned it to you. Now, I optionally allow you to return the query before the conversion to the array happens. Here is an example of how to use it:
@user.things.find_in_batches(:batch_size=>10, :relation=>true).each do |batch_query|
# do any kind of further querying/filtering/mapping that you want
# show that this is actually an activerecord relation, not an array of AR objects
puts batch_query.to_sql
# add more conditions to this query, this is just an example
batch_query = batch_query.where(:color=>"blue")
# pluck just the ids
puts batch_query.pluck(:id)
end
Ultimately, if you don't like any of the answers given on an SO post, you can roll-your-own solution. Consider only downvoting when an answer is either way off topic or not helpful in any way. We are all just trying to help. Downvoting an answer that has source code to prove it will only deter others from trying to help you.
Previous EDIT
In response to your comment (because my comment would not fit):
Previous 2nd EDIT:
This line of code within pluck returns an activerecord Result:
....
result = klass.connection.select_all(relation.arel, nil, bound_attributes)
...
I just stepped through the source code for you. Using select_all will save you some memory, but in the end, an activerecord Result was still created and mapped over even when you are using the pluck method.
Upvotes: 0
Reputation: 27961
It is, unfortunately, not a one-liner or helper that will allow you to do this, so instead:
limit = 1000
offset = 0
loop do
batch = @user.things.limit(limit).offset(offset).pluck(:id)
batch.each { |id| puts id }
break if batch.count < limit
offset += limit
end
Upvotes: 0
Reputation: 9298
I would use something like this:
User.things.find_each(batch_size: 1000).map(&:id)
This will give you an array of the ids.
Upvotes: -1
Reputation: 6568
You can try something like below, the each slice will take 4 elements at a time and them you can loop around the 4
@user.thing_ids.each_slice(4) do |batch|
batch.each do |id|
puts id
end
end
Upvotes: 0