Reputation: 468
I need to generate a unique id for file sizes of upto 200-300MB. The condition is that the algo should be quick, it should not take much time. I am selecting the files from a desktop and calculation a hash value as such:
HMACSHA256 myhmacsha256 = new HMACSHA256(key);
byte[] hashValue = myhmacsha256.ComputeHash(fileStream);
filestream is a handle to the file to read content from it. This method is going to take a lot of time for obvious reasons. Does windows generate a key for a file for its own book keeping that I could directly use ? Is there any other way to identify if the file is same, instead of matching file name which is not very foolproof.
Upvotes: 0
Views: 2628
Reputation: 6947
If you want a "quick and dirty" check, I would suggest looking at CRC-32. It is extremely fast (the algorithm simply involves doing XOR with table lookups), and if you aren't too concerned about collision resistance, a combination of the file size and the CRC-32 checksum over the file data should be adequate. 28.5 bits are required to represent the file size (that gets you to 379M bytes), which means you get a checksum value of effectively just over 60 bits. I would use a 64-bit quantity to store the file size, for future proofing, but 32 bits would work too in your scenario.
If collision resistance is a consideration, then you pretty much have to use one of the tried-and-true-yet-unbroken cryptographic hash algorithms. I would still concur with what Devils child wrote and also include the file size as a separate (readily accessible) part of the hash, however; if the sizes don't match, there is no chance that the file content can be the same, so in that case the computationally intensive hash calculation can be skipped.
Upvotes: 0
Reputation: 3678
MD5.Create().ComputeHash(fileStream);
Alternatively, I'd suggest looking at this rather similar question.
Upvotes: 1
Reputation: 14820
When you compute hashes and compare them, it would require both files to completely go through. My suggestion is to first check the file sizes, if they are identical and then go through the files byte by byte.
Upvotes: 0
Reputation: 974
How about generating a hash from the info that's readily available from the file itself? i.e. concatenate :
and create your own?
Upvotes: 0