BineG
BineG

Reputation: 365

Tasks are not running in parallel

I would like to run some things in parallel but all it does is slow everything down. Here is an example (not my actual problem but it simulates it).

If I run only one iteration the run time is about two seconds, but when I try to run 10 parallel instances suddenly the run time for each is about 20 seconds.

await RunParallel();

private async Task RunParallel()
{
    var tasks = Enumerable.Range(0, 1)
        .AsParallel()
        .Select(async x =>
        {
            AppLogger.WriteInfo($"task {x}");
            await DoSomeWork();
        })
        .ToList();

    await Task.WhenAll(tasks);
}

private Task DoSomeWork()
{
    return Task.Run(() =>
    {
        Stopwatch workWatch = Stopwatch.StartNew();

        string input = "Just some random string to hash it";
        for (int i = 0; i < 1000000; i++)
        {
            using (SHA1 sha1 = SHA1.Create())
            {
                var hash = sha1.ComputeHash(Encoding.UTF8.GetBytes(input));
                var sb = new StringBuilder(hash.Length * 2);

                foreach (byte b in hash)
                {
                    // can be "x2" if you want lowercase
                    sb.Append(b.ToString("X2"));
                }

                input = sb.ToString();
            }
        }

        workWatch.Stop();

        AppLogger.WriteInfo($"Work took {workWatch.Elapsed.TotalSeconds}s");
    });
}

If I replace the hashing code with

while (workWatch.ElapsedMilliseconds < 10000)
    Thread.SpinWait(1000);

or Task.Delay it works as expected. What is going on here?

Upvotes: 1

Views: 601

Answers (1)

Marc Gravell
Marc Gravell

Reputation: 1064044

You're basically profiling the allocator and garbage collector here. You have so many allocations - intermediate strings, new arrays all over the place... this is basically GC hell, and adding more workers compounds that, making the threads fight each-other rather than work together.

The fix there is simply: don't do that - be aware that allocation isn't free, and in a tight loop like this: can be a disaster. But there are lots of ways to avoid those allocations.

You're also using a sync-parallel API to invoke tasks which then do something non-async. Pick a lane: your work is synchronous; just use Parallel.For.

Also: amortize allocations by reusing objects where possible.

Results first:

(note: there's still some allocation here, hence the non-linear stacking - to retain compatibility I'm still allocating the final string of each hash in the loop, so that's 1M strings per worker, 40 characters, so 80 bytes (plus the object etc) - so over 80MiB per worker; non-trivial, but still nothing compared to the allocations in the original; these string allocations could be avoided too, if we wanted)

Running 2 workers...
Work took 0.2348809s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2348424s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 4 workers...
Work took 0.2335012s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2368322s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2386911s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2539417s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 6 workers...
Work took 0.2652288s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2692273s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2695841s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2697559s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2763701s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2781451s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 8 workers...
Work took 0.2808767s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2811291s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2835578s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2965003s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3052418s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3064866s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3156862s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3189329s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 10 workers...
Work took 0.3520882s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3547991s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3604287s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3637762s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3688235s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3697951s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3775134s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3854601s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3885827s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3956756s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA

Compared to a modified version of yours (that loses the Task stuff and writes the final hash):

Running 2 workers...
Work took 1.2838492s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 1.3045196s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 4 workers...
Work took 1.543938s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 1.5462838s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 1.5626509s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 1.5651681s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 6 workers...
Work took 3.6812531s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 3.7369614s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 3.7422675s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 3.7640644s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 3.7649982s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 3.7662394s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 8 workers...
Work took 7.7063103s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.7559415s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.7966256s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.7967291s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.7971053s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.7995026s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.8031556s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.8117826s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 10 workers...
Work took 6.640342s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 6.8570689s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.0663749s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.0665148s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.0766348s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.0905743s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.0986777s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.1038984s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.1090672s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.1132156s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA

Code:

using BenchmarkDotNet.Code;
using System.Buffers;
using System.Diagnostics;
using System.Security.Cryptography;
using System.Text;
RunParallel();
void RunParallel()
{
    for (int i = 2; i < 12; i+=2)
    {
        Console.WriteLine($"Running {i} workers...");
        Parallel.For(0, i, static i => DoSomeWork());
    }
}
 static void DoSomeWork()
{
    Stopwatch workWatch = Stopwatch.StartNew();

    string input = "Just some random string to hash it";
    byte[] buffer = Array.Empty<byte>();
    Span<byte> hashScratch = stackalloc byte[32]; // actually we only expect 20
    Span<char> hexScratch = stackalloc char[64]; // actually we only expect 40
    using SHA1 sha1 = SHA1.Create();

    for (int i = 0; i < 1000000; i++)
    {
        
        var bytes = Encoding.UTF8.GetMaxByteCount(input.Length); // actually max at this point
        if (buffer.Length < bytes) Resize(ref buffer, bytes);
        bytes = Encoding.UTF8.GetBytes(input, 0, input.Length, buffer, 0);

        if (sha1.TryComputeHash(new ReadOnlySpan<byte>(buffer, 0, bytes), hashScratch, out var hashLen))
        {
            int cIndex = 0;
            const string HexToChar = "0123456789ABCDEF";
            foreach (var b in hashScratch.Slice(0, hashLen))
            {
                hexScratch[cIndex++] = HexToChar[b >> 4];
                hexScratch[cIndex++] = HexToChar[b & 15];
            }
            input = new string(hexScratch.Slice(0, cIndex));
        }
    }
    Resize(ref buffer, 0);

    workWatch.Stop();

    Console.WriteLine($"Work took {workWatch.Elapsed.TotalSeconds}s; final hash: {input}");
}
static void Resize(ref byte[] buffer, int length)
{
    if (buffer.Length > 0)
    {
        ArrayPool<byte>.Shared.Return(buffer);
    }
    buffer = length > 0 ? ArrayPool<byte>.Shared.Rent(length) : Array.Empty<byte>();
}

Upvotes: 7

Related Questions