Reputation: 729

Uploading thousands of images with Paperclip to S3

I have ~16,000 images I'm trying to upload to Amazon. Right now, they're on my local file system. I'd like to upload them to S3 using Paperclip, but I do NOT want to upload them to my server first. I'm using Heroku and they limit slug size.

Is there a way to use a rake task to upload the images directly from my local file system to S3 via Paperclip?

Upvotes: 3

Answers (3)

Charles Forcey

Reputation: 537

Great answer Johnny Grass and great question Chris. I had a few hundred tif files on my local machine, Heroku, paperclip, and s3. Some of the tiff files were > 100MB, so getting heroku to pay attention for that long required delayed job and some extra work. Since this was a mostly one time batch process (5 different image forms created from each with 5 x uploads), the idea of a rake task fit perfectly. Here, in case it helps, is the rake task I created assuming like Johnny wrote that your development database has current data (use pg backup to get fresh set of ids) and is connected to S3.

I have a model called "Item" with an attachment "image". I wanted to check if existing Items already had an image, and if not, upload a new one. The effect is to mirror a directory of source files. Good extensions might be to check the dates and see if the local tif if updated.

# lib/image_management.rake
namespace :images do
  desc 'upload images through paperclip with postprocessing'
  task :create => :environment do

    directory = "/Volumes/data/historicus/_projects/deeplandscapes/library/tifs/*.tif"
    images = Dir[directory]

    puts "\n\nProcessing #{ images.length } images in #{directory}..."

    items_with_errors = []
    items_updated = []
    items_skipped = []

    images.each do |image|
    # find the needed record
      image_basename = File.basename(image)
      id = image_basename.gsub("it_", "").gsub(".tif", "").to_i
      if id > 0
        item = Item.find(id) rescue nil
        # check if it has an image already
        if item
          unless item.image.exists?
            # create the image
            success = item.update_attributes(:image => File.open(image))
            if success
              items_updated << item
              print ' u '
            else
              items_with_errors << item
              print ' e '
            end
          else
            items_skipped << item
            print ' s '
          end
        else
          print "[#{id}] "
        end
      else
        print " [no id for #{image_basename}] "    
      end
    end
    unless items_with_errors.empty?
      puts "\n\nThe following items had errors: "
      items_with_errors.each do |error_image|
        puts "#{error_image.id}: #{error_image.errors.full_messages}"
      end
    end

    puts "\n\nUpdated #{items_updated.length} items."
    puts "Skipped #{items_skipped.length} items."
    puts "Update complete.\n"

  end
end

Upvotes: 0

David

Reputation: 7303

You can configure your app to use Amazon S3 for paperclip storage in development (see my example) and upload the files using a rake task like this:

Lets's say your folder of images was in your_app_folder/public/images, you can create a rake task similar to this.

namespace :images do
  desc "Upload images."
  task :create => :environment do
    @images = Dir["#{RAILS_ROOT}/public/images/*.*"]
    for image in @images
      MyModel.create(:image => File.open(image))
    end
  end
end

Upvotes: 4

pcg79

Reputation: 1283

Yes. I did something similar on my first personal Rails project. Here's a previous SO question (Paperclip S3 download remote images) whose answer links to the where I found my answer so long ago (http://trevorturk.com/2008/12/11/easy-upload-via-url-with-paperclip/).

Upvotes: 1

Uploading thousands of images with Paperclip to S3

Answers (3)

Related Questions