Reputation: 6637
I have some rake scripts that operate on collections of hundreds of thousands of items.
Often, my server runs out of memory and the script crashes. I assume that this is because my code looks like this:
Asset.where(:archived => false).each { |asset| asset.action! }
As far as I can tell, Rails fetches the entire set into memory, and then iterates through each instance.
My server doesn't seem to be happy loading 300,000 instances of Asset
at once, so in order to reduce the memory requirements I've had to resort to something like this:
collection = Asset.where(:archived => false) # ActiveRecord::Relation
while collection.count > 0
collection.limit(1000).each { |asset| asset.action! }
end
Unfortunately, that doesn't seem very clean. It gets even worse when the action doesn't remove the items from the set, and I have to keep track with offset
s too. Does anyone have suggestions as to a better way of partitioning the data or holding onto the relation longer, and only loading rows as necessary?
Upvotes: 1
Views: 232
Reputation: 40277
The find_each method is designed to help in these situations. It'll
Asset.where(:archived => false).find_each(:batch_size=>500) do |asset|
asset.stuff
end
by default, the batch size is 1000
Upvotes: 2