Joash Lewis
Joash Lewis

Reputation: 43

How can I compute two hashes without reading the same file twice?

I have a program which is going to be used on very large files (current test data is 250GB). I need to be able to calculate both MD5 and SHA1 hashes for these files. Currently my code drops the stream into MD5.Create().ComputeHash(Stream stream), and then the same for SHA1. These, as far as I can tell, read the file in 4096-byte blocks to a buffer internal to the hashing function, until the end of the stream.

The problem is, doing this one after the other takes a VERY long time! Is there any way I can take data into a buffer and provide the buffer to BOTH algorithms before reading a new block into the buffer?

Please explain thoroughly as I'm not an experienced coder.

Upvotes: 4

Views: 1734

Answers (1)

Jon Skeet
Jon Skeet

Reputation: 1500923

Sure. You can call TransformBlock repeatedly, and then TransformFinalBlock at the end and then use Hash to get the final hash. So something like:

using (var md5 = MD5.Create()) // Or MD5Cng.Create
using (var sha1 = SHA1.Create()) // Or SHA1Cng.Create
using (var input = File.OpenRead("file.data"))
{
    byte[] buffer = new byte[8192];
    int bytesRead;
    while ((bytesRead = input.Read(buffer, 0, buffer.Length()) > 0)
    {
        md5.TransformBlock(buffer, 0, bytesRead, buffer, 0);
        sha1.TransformBlock(buffer, 0, bytesRead, buffer, 0);
    }
    // We have to call TransformFinalBlock, but we don't have any
    // more data - just provide 0 bytes.
    md5.TransformFinalBlock(buffer, 0, 0, buffer, 0);
    sha1.TransformFinalBlock(buffer, 0, 0, buffer, 0);

    byte[] md5Hash = md5.Hash;
    byte[] sha1Hash = sha1.Hash;
}

The MD5Cng.Create and SHA1Cng.Create calls will create wrappers around native implementations which are likely to be faster than the implementations returned by MD5.Create and SHA1.Create, but which will be a bit less portable (e.g. for PCLs).

Upvotes: 12

Related Questions