Reputation: 9726
I have a web server, where users upload their files. I want to implement logic, that will show to user, if he will try to upload same file twice.
First idea is to save md5_file() value to the db and then check if there are any files with same md5 value. Files size differs from 2 megabytes up to 300.
Upvotes: 0
Views: 188
Reputation: 2970
If checking for duplicates, you can usually get away with using sha1.
Or to bulletproof it:
$hash = hash_file("sha512", $filename); // 128 char hex output
(And yes, with very large files md5 does indeed have a fairly high number of collisions)
Upvotes: 1
Reputation: 767
MD5 collisions are rare enough that in this case it shouldn't be an issue.
If you are dealing with large files however, you'll have to remember you are essentially uploading the file any way before you even check if it is a duplicate.
Upload -> MD5 -> Compare -> Keep or Disregard.
Upvotes: 1
Reputation: 239270
Yes, this is exactly what hashing is for. Consider using sha1
, it's an all around superior hashing algorithm.
No, you probably shouldn't worry about collisions. The odds of people accidentally causing collisions is extremely low, close enough to impossible that you shouldn't waste any time thinking about it up-front. If you are seriously worried about it, use the hash as a first check, and then compare the file sizes, then compare the files bit-by-bit.
Upvotes: 1