Reputation: 49
about integrity checking of files, I am not sure if CRC32 or MD5 checksum generate "unpredictable" hash values:
When checking if files are identical, usually CRC32 or MD5 checksum is used. It means that each file that is possibly a duplicate of another is read from the beginning to the end and a unique number will be calculated based on its unique binary content. As a fingerprint, this number is stored and used to compare the file’s contents to other files to determine if they are truly identical. That means a tiny change in a file results in a fairly large and "unpredictable" change in the generated hash.
Upvotes: 2
Views: 661
Reputation: 112239
This is not a proper use of the term "unpredictable". The algorithms are deterministic, which means that they will always produce the same output given the same input. Therefore they are entirely predictable.
Yes, for both a small change in the input will result in a "fairly large change" in the output, on the order of half of the bits of the output.
These checks cannot be used to determine if two files "are truly identical". They can only indicate that there is a very high probability that the two files are identical. You'd need to directly compare the two files to determine if they are truly identical.
On the other hand, if the checks differ, then you know for certain that the files differ.
Upvotes: 2