neezer
neezer

Reputation: 20570

Rails: Preventing Duplicate Photo Uploads with Paperclip?

Is there anyway to throw a validation error if a user tries to upload the same photo twice to a Rails app using Paperclip? Paperclip doesn't seem to offer this functionality...

I'm using Rails 2.3.5 and Paperclip (obviously).


SOLUTION: (or one of them, at least)

Using Beerlington's suggestion, I decided to go with an MD5 Checksum comparison:

class Photo < ActiveRecord::Base
  #...
  has_attached_file :image #, ...

  before_validation_on_create :generate_md5_checksum
  validate :unique_photo
  #...

  def generate_md5_checksum
    self.md5_checksum = Digest::MD5.hexdigest(image.to_file.read)
  end

  def unique_photo
    photo_digest = self.md5_checksum
    errors.add_to_base "You have already uploaded that file!" unless User.find(self.user_id).photos.find_by_md5_checksum(photo_digest).nil?
  end

  # ...
end

Then I just added a column to my photos table called md5_checksum, and voila! Now my app throws a validation error if you try to upload the same photo!

No idea how efficient/inefficient this is, so refactoring's welcome!

Thanks!

Upvotes: 10

Views: 4674

Answers (4)

user3143898
user3143898

Reputation: 61

You might run into a problem when your images have amended EXIF metadata. This happened to me, and I had to extract pixel values and calculate MD5s out of them, to ignore changes made by Wordpress etc. You can read about it on our blog: http://www.amberbit.com/blog/2013/12/20/similar-images-detection-in-ruby-with-phash/ but essentially you want to get the pixel data out of image with some tool (like RMagick), concatinate it to string, and calculate MD5 out of that.

Upvotes: 3

Howler
Howler

Reputation: 2262

For anyone else trying to do this. Paperclip now has md5 hashing built in. If you have a [attachment]_fingerprint in your model, paperclip will populate this with the MD5.

Since I already had a column named hash_value, I made a 'virtual' attribute called fingerprint

#Virtual attribute to have paperclip generate the md5
def picture_fingerprint
  self.hash_value
end

def picture_fingerprint=(md5Hash)
  self.hash_value=md5Hash
end

And, with rails3, using sexy_validations, I was able to simply add this to the top my my model to ensure that the hash_value is unique before it saves the model:

validates :hash_value, :uniqueness => { :message => "Image has already been uploaded." }

Upvotes: 10

Peter Brown
Peter Brown

Reputation: 51717

What about doing an MD5 on the image file? If it is the exact same file, the MD5 hash will be the same for both images.

Upvotes: 11

sosborn
sosborn

Reputation: 14694

As Stephen indicated, your biggest issue is how to determine if a file is a duplicate, and there is no clear answer for this.

If these are photos taken with a digital camera, you would want to compare the EXIF data. If the EXIF data matches then the photo is most likely a duplicate. If it is a duplicate then you can inform the user of this. You'll have to accept the upload initially though so that you examine the EXIF data.

I should mention that EXIFR is a nice ruby gem for examining the EXIF data.

Upvotes: 0

Related Questions