Reputation: 43
I have a program which is going to be used on very large files (current test data is 250GB). I need to be able to calculate both MD5 and SHA1 hashes for these files. Currently my code drops the stream into MD5.Create().ComputeHash(Stream stream), and then the same for SHA1. These, as far as I can tell, read the file in 4096-byte blocks to a buffer internal to the hashing function, until the end of the stream.
The problem is, doing this one after the other takes a VERY long time! Is there any way I can take data into a buffer and provide the buffer to BOTH algorithms before reading a new block into the buffer?
Please explain thoroughly as I'm not an experienced coder.
Upvotes: 4
Views: 1734
Reputation: 1500923
Sure. You can call TransformBlock
repeatedly, and then TransformFinalBlock
at the end and then use Hash
to get the final hash. So something like:
using (var md5 = MD5.Create()) // Or MD5Cng.Create
using (var sha1 = SHA1.Create()) // Or SHA1Cng.Create
using (var input = File.OpenRead("file.data"))
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length()) > 0)
{
md5.TransformBlock(buffer, 0, bytesRead, buffer, 0);
sha1.TransformBlock(buffer, 0, bytesRead, buffer, 0);
}
// We have to call TransformFinalBlock, but we don't have any
// more data - just provide 0 bytes.
md5.TransformFinalBlock(buffer, 0, 0, buffer, 0);
sha1.TransformFinalBlock(buffer, 0, 0, buffer, 0);
byte[] md5Hash = md5.Hash;
byte[] sha1Hash = sha1.Hash;
}
The MD5Cng.Create
and SHA1Cng.Create
calls will create wrappers around native implementations which are likely to be faster than the implementations returned by MD5.Create
and SHA1.Create
, but which will be a bit less portable (e.g. for PCLs).
Upvotes: 12