Reputation: 1270
I have a folder with around several thousand RGB 8-bit-per-channel image files on my computer that are anywhere between 2000x2000 and 8000x8000 in resolution (so most of them are extremely large).
I would like to store some small value, such as a hash, for each image so that I have a value to easily compare to in the future to see if any image files have changed. There are three primary requirements in the calculation of this value:
There are a lot of ways I could go about this, such as sha1, md5, etc, but the real goal here is speed, and really just any extremely quick way to identify if ANY change at all has been made to an image.
How would you achieve this in Python? Is there a particular hash algorithm you recommend for speed? Or can you devise a different way to achieve my three goals altogether?
Upvotes: -1
Views: 944
Reputation: 443
- The calculation of this value needs to be fast
- The result needs to be different if ANY part of the image file changes, even in the slightest amount, even if just one pixel changes. (The hash should not take filename into account).
- Collisions should basically never happen.
never
) happen. This is the nature of hash algorithms
Example to 1st (optimizing algorithm),
Optionally, before checking hashes, you can partially calculate and compare hashes, instead of all the file.
If most of your files will be more likely different, then checking other things before calculating hash probably will be faster.
But if most of your files will be identical, then the steps before the hashing will just consume more time. Because you'll already have to calculate the hash for most of files.
So try to implement most efficient algorithm according to your context.
Upvotes: 1