intoTHEwild
intoTHEwild

Reputation: 468

Unique id for a file in C#

I need to generate a unique id for file sizes of upto 200-300MB. The condition is that the algo should be quick, it should not take much time. I am selecting the files from a desktop and calculation a hash value as such:

HMACSHA256 myhmacsha256 = new HMACSHA256(key);
byte[] hashValue = myhmacsha256.ComputeHash(fileStream);

filestream is a handle to the file to read content from it. This method is going to take a lot of time for obvious reasons. Does windows generate a key for a file for its own book keeping that I could directly use ? Is there any other way to identify if the file is same, instead of matching file name which is not very foolproof.

Upvotes: 0

Views: 2628

Answers (4)

user
user

Reputation: 6947

If you want a "quick and dirty" check, I would suggest looking at CRC-32. It is extremely fast (the algorithm simply involves doing XOR with table lookups), and if you aren't too concerned about collision resistance, a combination of the file size and the CRC-32 checksum over the file data should be adequate. 28.5 bits are required to represent the file size (that gets you to 379M bytes), which means you get a checksum value of effectively just over 60 bits. I would use a 64-bit quantity to store the file size, for future proofing, but 32 bits would work too in your scenario.

If collision resistance is a consideration, then you pretty much have to use one of the tried-and-true-yet-unbroken cryptographic hash algorithms. I would still concur with what Devils child wrote and also include the file size as a separate (readily accessible) part of the hash, however; if the sizes don't match, there is no chance that the file content can be the same, so in that case the computationally intensive hash calculation can be skipped.

Upvotes: 0

mtijn
mtijn

Reputation: 3678

MD5.Create().ComputeHash(fileStream);

Alternatively, I'd suggest looking at this rather similar question.

Upvotes: 1

bytecode77
bytecode77

Reputation: 14820

When you compute hashes and compare them, it would require both files to completely go through. My suggestion is to first check the file sizes, if they are identical and then go through the files byte by byte.

Upvotes: 0

Marcel
Marcel

Reputation: 974

How about generating a hash from the info that's readily available from the file itself? i.e. concatenate :

  • File Name
  • File Size
  • Created Date
  • Last Modified Date

and create your own?

Upvotes: 0

Related Questions