Reputation: 20570
Is there anyway to throw a validation error if a user tries to upload the same photo twice to a Rails app using Paperclip? Paperclip doesn't seem to offer this functionality...
I'm using Rails 2.3.5 and Paperclip (obviously).
SOLUTION: (or one of them, at least)
Using Beerlington's suggestion, I decided to go with an MD5 Checksum comparison:
class Photo < ActiveRecord::Base
#...
has_attached_file :image #, ...
before_validation_on_create :generate_md5_checksum
validate :unique_photo
#...
def generate_md5_checksum
self.md5_checksum = Digest::MD5.hexdigest(image.to_file.read)
end
def unique_photo
photo_digest = self.md5_checksum
errors.add_to_base "You have already uploaded that file!" unless User.find(self.user_id).photos.find_by_md5_checksum(photo_digest).nil?
end
# ...
end
Then I just added a column to my photos
table called md5_checksum
, and voila! Now my app throws a validation error if you try to upload the same photo!
No idea how efficient/inefficient this is, so refactoring's welcome!
Thanks!
Upvotes: 10
Views: 4674
Reputation: 61
You might run into a problem when your images have amended EXIF metadata. This happened to me, and I had to extract pixel values and calculate MD5s out of them, to ignore changes made by Wordpress etc. You can read about it on our blog: http://www.amberbit.com/blog/2013/12/20/similar-images-detection-in-ruby-with-phash/ but essentially you want to get the pixel data out of image with some tool (like RMagick), concatinate it to string, and calculate MD5 out of that.
Upvotes: 3
Reputation: 2262
For anyone else trying to do this. Paperclip now has md5 hashing built in. If you have a [attachment]_fingerprint in your model, paperclip will populate this with the MD5.
Since I already had a column named hash_value, I made a 'virtual' attribute called fingerprint
#Virtual attribute to have paperclip generate the md5
def picture_fingerprint
self.hash_value
end
def picture_fingerprint=(md5Hash)
self.hash_value=md5Hash
end
And, with rails3, using sexy_validations, I was able to simply add this to the top my my model to ensure that the hash_value is unique before it saves the model:
validates :hash_value, :uniqueness => { :message => "Image has already been uploaded." }
Upvotes: 10
Reputation: 51717
What about doing an MD5 on the image file? If it is the exact same file, the MD5 hash will be the same for both images.
Upvotes: 11
Reputation: 14694
As Stephen indicated, your biggest issue is how to determine if a file is a duplicate, and there is no clear answer for this.
If these are photos taken with a digital camera, you would want to compare the EXIF data. If the EXIF data matches then the photo is most likely a duplicate. If it is a duplicate then you can inform the user of this. You'll have to accept the upload initially though so that you examine the EXIF data.
I should mention that EXIFR is a nice ruby gem for examining the EXIF data.
Upvotes: 0