Peter Giacomo Lombardo
Peter Giacomo Lombardo

Reputation: 712

How to add a paperclip style to 13k images (on AWS S3) without running out of memory?

I added two styles (smallcard & mediumcard) to my paperclip attachment model Screenshot :

class Screenshot < ActiveRecord::Base
    has_attached_file :image,
    :styles => { :tiny => "x75", :small => "x245", :medium => "x480", :large => "1280x900>",
                 :smallcard => "280x245#", :mediumcard => "570x480#" },
    :storage => :s3,
    :s3_credentials => "#{Rails.root}/config/amazon_s3.yml",
    :path => "/screenshots/:id_partition/:style/:filename"
end

I hand created a public/system/paperclip_attachments.yml file to reduce processing of pre-existing styles:

---
:Screenshot:
  :image:
    - :tiny
    - :small
    - :medium
    - :large

But still when I run rake paperclip:refresh:missing_styles CLASS=Screenshot I get the following:

Regenerating Screenshot -> image -> [:mediumcard, :smallcard]
rake aborted!
Cannot allocate memory - identify -format %wx%h '/tmp/79a229e96ab52dfa760132958da47bf320120806-31260-1eleoww[0]'
Tasks: TOP => paperclip:refresh:missing_styles
[clip]

When I tail the logs, processing only gets up into the 500s (ids).

The server is admittedly a Linode 512 running Ubuntu and it's been rock solid at serving pages for 3 Rails apps and 1 PHP app for years. I've never run out of memory on it before.

Monitoring the rake task process, it incrementally grows with each processed image until it eats up all available RAM.

Maybe it's time for my Linode to grow..but first I'm hoping for some other options.

How can I get around this memory issue and add these two styles to the pre-existing 13k images?

Thanks for your help!

Upvotes: 0

Views: 885

Answers (2)

Peter Giacomo Lombardo
Peter Giacomo Lombardo

Reputation: 712

Hopefully this can help someone else having the same issue.

As Chris suggested, I wrapped one rake task inside of another which is called using %x(). Each iteration fully releases the memory from the previous call.

namespace :screenshots do
  desc "Incrementally rebuild thumbnails. START=0 & BATCH_SIZE=10 & VERBOSE=false"
  task :reprocess_stepper => :environment do
    batch_size = (ENV['BATCH_SIZE'] || ENV['batch_size'] || 10)
    verbose    = (ENV['VERBOSE'] || ENV['verbose'] || nil)

    total = Screenshot.count
    start = 0

    while start < total
      puts "Spawning: bundle exec rake screenshots:reprocess_some START=#{start} BATCH_SIZE=#{batch_size} VERBOSE=#{verbose} RAILS_ENV=#{Rails.env}"
      puts %x{bundle exec rake screenshots:reprocess_some START=#{start} BATCH_SIZE=#{batch_size} VERBOSE=#{verbose} RAILS_ENV=#{Rails.env} }
      start = start + batch_size.to_i
    end
  end

  desc "Reprocess a batch of screenshots. START=0 & BATCH_SIZE=10 & VERBOSE=false"
  task :reprocess_some => :environment do
    start      = (ENV['START'] || ENV['start'] || 0)
    batch_size = (ENV['BATCH_SIZE'] || ENV['batch_size'] || 10)
    verbose    = (ENV['VERBOSE'] || ENV['verbose'] || nil)

    puts "start = #{start} & batch_size = #{batch_size}" if verbose
    puts "RAILS_ENV=#{Rails.env}" if verbose

    screenshots = Screenshot.order("id ASC").offset(start).limit(batch_size).all
    screenshots.each do |ss|
      puts "Re-processing paperclip image on screenshot ID: #{ss.id}" if verbose
      STDOUT.flush
      ss.image.reprocess!
    end
  end
end

You can then call this task as follows:

RAILS_ENV=production bundle exec rake screenshots:reprocess_stepper VERBOSE=true BATCH_SIZE=50

Upvotes: 0

Chris
Chris

Reputation: 342

You need to give your system a chance to free the memory properly. A bold trick we used when confronted with a similar problem using an ORM for a PHP batch task is this: wrap your task in another task which calls the first task only for one item at a time. In general, you should refactor the first task to take an argument for the base image. The second task should gather all images (in a memory-friendly way, e.g. object ids or something like that) and then loop through them and call the first task with each as argument. When the first task ist completed it will return the memory to the os which can then free the memory. The second or wrapper task on the other hand never needs as much memory at once. In this way, maximum memory usage should be the maximum for processing one image and not all images.

Upvotes: 2

Related Questions