user11708636
user11708636

Reputation: 51

Returning multiple values asynchronously

I have the requirement of computing these two independent tasks.Earlier I was doing it serially like:

string firstHash = CalculateMD5Hash("MyName");
string secondHash = CalculateMD5Hash("NoName");

And the method calculateMD5Hash looks like.It is used to calculate MD5 hash values for files as big as 16GB:

private string CalculateMD5(string filename)
{
    using (var md5 = MD5.Create())
    {
        using (var stream = File.OpenRead(filename))
        {
            var hash = md5.ComputeHash(stream);
            return BitConverter.ToString(hash).Replace("-", string.Empty).ToLowerInvariant();
        }
    }
}

But since these 2 CalculateMD5Hash methods can run in parallel, I was trying this:

Task<string> sequenceFileMd5Task = CalculateMD5("MyName");
Task<string> targetFileMD5task = CalculateMD5("NoName");
string firstHash = await sequenceFileMd5Task;
string secondHash = await targetFileMD5task;

And my CalculateMD5 method looks like:

private async Task<string> CalculateMD5(string filename)
{
    using (var md5 = MD5.Create())
    {
        using (var stream = File.OpenRead(filename))
        {
            var hash = md5.ComputeHash(stream);
            return BitConverter.ToString(hash).Replace("-", string.Empty).ToLowerInvariant();
        }
    }
}

I hoped for the code to work asynchronously, but it works synchronously.

Upvotes: 0

Views: 703

Answers (3)

Matthew Watson
Matthew Watson

Reputation: 109537

One way to speed this up is to use double-buffering so that one thread can be reading from the file into one buffer, while the MD5 is being calculated for another buffer.

This allows you to overlap the I/O with the computation.

The best way to do this would be to have a single task that was responsible for calculating the Md5 for all the blocks of data, but since that complicates the code quite a bit (and is not likely to yield much better results) I shall instead create a new task for each block.

The code looks like this:

public static async Task<byte[]> ComputeMd5Async(string filename)
{
    using (var md5  = MD5.Create())
    using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, 16384, FileOptions.SequentialScan | FileOptions.Asynchronous))
    {
        const int BUFFER_SIZE = 16 * 1024 * 1024; // Adjust buffer size to taste.

        byte[] buffer1 = new byte[BUFFER_SIZE];
        byte[] buffer2 = new byte[BUFFER_SIZE];
        byte[] buffer  = buffer1; // Double-buffered, so use 'buffer' to switch between buffers.

        var task = Task.CompletedTask;

        while (true)
        {
            buffer = (buffer == buffer1) ? buffer2 : buffer1; // Swap buffers for double-buffering.
            int n = await file.ReadAsync(buffer, 0, buffer.Length);

            await task;
            task.Dispose();

            if (n == 0)
                break;

            var block = buffer;
            task = Task.Run(() => md5.TransformBlock(block, 0, n, null, 0));
        }

        md5.TransformFinalBlock(buffer, 0, 0);

        return md5.Hash;
    }
}

Here's a compilable test app:

using System;
using System.Diagnostics;
using System.IO;
using System.Security.Cryptography;
using System.Threading.Tasks;

namespace Demo
{
    class Program
    {
        static async Task Main()
        {
            string file = @"C:\ISO\063-2495-00-Rev 1.iso";

            Stopwatch sw = new Stopwatch();

            for (int i = 0; i < 4; ++i) // Try several times.
            {
                sw.Restart();

                var hash = await ComputeMd5Async(file);

                Console.WriteLine("ComputeMd5Async() Took " + sw.Elapsed);
                Console.WriteLine(string.Join(", ", hash));
                Console.WriteLine();

                sw.Restart();

                hash = ComputeMd5(file);

                Console.WriteLine("ComputeMd5() Took " + sw.Elapsed);
                Console.WriteLine(string.Join(", ", hash));
                Console.WriteLine();
            }
        }

        public static byte[] ComputeMd5(string filename)
        {
            using var md5    = MD5.Create();
            using var stream = File.OpenRead(filename);

            md5.ComputeHash(stream);

            return md5.Hash;
        }

        public static async Task<byte[]> ComputeMd5Async(string filename)
        {
            using (var md5  = MD5.Create())
            using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, 16384, FileOptions.SequentialScan | FileOptions.Asynchronous))
            {
                const int BUFFER_SIZE = 16 * 1024 * 1024; // Adjust buffer size to taste.

                byte[] buffer1 = new byte[BUFFER_SIZE];
                byte[] buffer2 = new byte[BUFFER_SIZE];
                byte[] buffer  = buffer1; // Double-buffered, so use 'buffer' to switch between buffers.

                var task = Task.CompletedTask;

                while (true)
                {
                    buffer = (buffer == buffer1) ? buffer2 : buffer1; // Swap buffers for double-buffering.
                    int n = await file.ReadAsync(buffer, 0, buffer.Length);

                    await task;
                    task.Dispose();

                    if (n == 0)
                        break;

                    var block = buffer;
                    task = Task.Run(() => md5.TransformBlock(block, 0, n, null, 0));
                }

                md5.TransformFinalBlock(buffer, 0, 0);

                return md5.Hash;
            }
        }
    }
}

And the results I got for a file of size ~2.5GB:

ComputeMd5Async() Took 00:00:04.8066365
49, 54, 154, 19, 115, 198, 28, 163, 5, 182, 183, 91, 2, 5, 241, 253

ComputeMd5() Took 00:00:06.9654982
49, 54, 154, 19, 115, 198, 28, 163, 5, 182, 183, 91, 2, 5, 241, 253

ComputeMd5Async() Took 00:00:04.7018911
49, 54, 154, 19, 115, 198, 28, 163, 5, 182, 183, 91, 2, 5, 241, 253

ComputeMd5() Took 00:00:07.3552470
49, 54, 154, 19, 115, 198, 28, 163, 5, 182, 183, 91, 2, 5, 241, 253

ComputeMd5Async() Took 00:00:04.6536709
49, 54, 154, 19, 115, 198, 28, 163, 5, 182, 183, 91, 2, 5, 241, 253

ComputeMd5() Took 00:00:06.9807878
49, 54, 154, 19, 115, 198, 28, 163, 5, 182, 183, 91, 2, 5, 241, 253

ComputeMd5Async() Took 00:00:04.7271215
49, 54, 154, 19, 115, 198, 28, 163, 5, 182, 183, 91, 2, 5, 241, 253

ComputeMd5() Took 00:00:07.4089941
49, 54, 154, 19, 115, 198, 28, 163, 5, 182, 183, 91, 2, 5, 241, 253

So the async double-buffered version runs about 50% faster.

There may be faster ways, but this is a fairly simple approach.

Upvotes: 2

Anand Chapla
Anand Chapla

Reputation: 182

You may change function body into task then wait for result.

private async Task<string> CalculateMD5(string filename)
{
    return await Task.Run(() =>
    {
        using (var md5 = MD5.Create())
        {
            using (var stream = File.OpenRead(filename))
            {
                var hash = md5.ComputeHash(stream);
                return BitConverter.ToString(hash).Replace("-", string.Empty).ToLowerInvariant();
            }
        }
    });
}

Upvotes: 2

Matthew Watson
Matthew Watson

Reputation: 109537

This is likely to be I/O limited, so parallelising it will probably not speed things up much (and indeed might even slow things down).

Having said that, the issue with your code is that you are not creating any new tasks to run the code in the background (just specifying async doesn't create any threads).

Rather than trying to "force" it to use async, the easiest solution is probably to leverage PLinq via AsParallel:

List<string> files = new List<string>()
{
    "MyName",
    "NoName"
};

var results = files.AsParallel().Select(CalculateMD5).ToList();

If you want to restrict the number of threads used for this you can use WithDegreeOfParallelism() as per the example below, which restricts the number of parallel threads to 2:

var results = files.AsParallel().WithDegreeOfParallelism(2).Select(CalculateMD5).ToList();

Note, however, that if there was such a thing as MD5.COmputeHashAsync() you would certainly want to use that along with async/await and Task.WhenAll() - but such a thing does not exist.

Upvotes: 3

Related Questions