Reputation: 365
I would like to run some things in parallel but all it does is slow everything down. Here is an example (not my actual problem but it simulates it).
If I run only one iteration the run time is about two seconds, but when I try to run 10 parallel instances suddenly the run time for each is about 20 seconds.
await RunParallel();
private async Task RunParallel()
{
var tasks = Enumerable.Range(0, 1)
.AsParallel()
.Select(async x =>
{
AppLogger.WriteInfo($"task {x}");
await DoSomeWork();
})
.ToList();
await Task.WhenAll(tasks);
}
private Task DoSomeWork()
{
return Task.Run(() =>
{
Stopwatch workWatch = Stopwatch.StartNew();
string input = "Just some random string to hash it";
for (int i = 0; i < 1000000; i++)
{
using (SHA1 sha1 = SHA1.Create())
{
var hash = sha1.ComputeHash(Encoding.UTF8.GetBytes(input));
var sb = new StringBuilder(hash.Length * 2);
foreach (byte b in hash)
{
// can be "x2" if you want lowercase
sb.Append(b.ToString("X2"));
}
input = sb.ToString();
}
}
workWatch.Stop();
AppLogger.WriteInfo($"Work took {workWatch.Elapsed.TotalSeconds}s");
});
}
If I replace the hashing code with
while (workWatch.ElapsedMilliseconds < 10000)
Thread.SpinWait(1000);
or Task.Delay
it works as expected. What is going on here?
Upvotes: 1
Views: 601
Reputation: 1064044
You're basically profiling the allocator and garbage collector here. You have so many allocations - intermediate strings, new arrays all over the place... this is basically GC hell, and adding more workers compounds that, making the threads fight each-other rather than work together.
The fix there is simply: don't do that - be aware that allocation isn't free, and in a tight loop like this: can be a disaster. But there are lots of ways to avoid those allocations.
You're also using a sync-parallel API to invoke tasks which then do something non-async. Pick a lane: your work is synchronous; just use Parallel.For
.
Also: amortize allocations by reusing objects where possible.
Results first:
(note: there's still some allocation here, hence the non-linear stacking - to retain compatibility I'm still allocating the final string
of each hash in the loop, so that's 1M strings per worker, 40 characters, so 80 bytes (plus the object etc) - so over 80MiB per worker; non-trivial, but still nothing compared to the allocations in the original; these string
allocations could be avoided too, if we wanted)
Running 2 workers...
Work took 0.2348809s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2348424s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 4 workers...
Work took 0.2335012s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2368322s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2386911s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2539417s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 6 workers...
Work took 0.2652288s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2692273s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2695841s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2697559s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2763701s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2781451s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 8 workers...
Work took 0.2808767s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2811291s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2835578s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.2965003s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3052418s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3064866s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3156862s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3189329s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 10 workers...
Work took 0.3520882s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3547991s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3604287s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3637762s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3688235s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3697951s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3775134s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3854601s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3885827s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 0.3956756s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Compared to a modified version of yours (that loses the Task
stuff and writes the final hash):
Running 2 workers...
Work took 1.2838492s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 1.3045196s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 4 workers...
Work took 1.543938s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 1.5462838s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 1.5626509s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 1.5651681s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 6 workers...
Work took 3.6812531s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 3.7369614s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 3.7422675s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 3.7640644s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 3.7649982s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 3.7662394s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 8 workers...
Work took 7.7063103s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.7559415s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.7966256s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.7967291s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.7971053s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.7995026s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.8031556s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.8117826s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Running 10 workers...
Work took 6.640342s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 6.8570689s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.0663749s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.0665148s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.0766348s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.0905743s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.0986777s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.1038984s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.1090672s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Work took 7.1132156s; final hash: 5C1AEFCEF5374768A470C2707AFD2A5AC404CDEA
Code:
using BenchmarkDotNet.Code;
using System.Buffers;
using System.Diagnostics;
using System.Security.Cryptography;
using System.Text;
RunParallel();
void RunParallel()
{
for (int i = 2; i < 12; i+=2)
{
Console.WriteLine($"Running {i} workers...");
Parallel.For(0, i, static i => DoSomeWork());
}
}
static void DoSomeWork()
{
Stopwatch workWatch = Stopwatch.StartNew();
string input = "Just some random string to hash it";
byte[] buffer = Array.Empty<byte>();
Span<byte> hashScratch = stackalloc byte[32]; // actually we only expect 20
Span<char> hexScratch = stackalloc char[64]; // actually we only expect 40
using SHA1 sha1 = SHA1.Create();
for (int i = 0; i < 1000000; i++)
{
var bytes = Encoding.UTF8.GetMaxByteCount(input.Length); // actually max at this point
if (buffer.Length < bytes) Resize(ref buffer, bytes);
bytes = Encoding.UTF8.GetBytes(input, 0, input.Length, buffer, 0);
if (sha1.TryComputeHash(new ReadOnlySpan<byte>(buffer, 0, bytes), hashScratch, out var hashLen))
{
int cIndex = 0;
const string HexToChar = "0123456789ABCDEF";
foreach (var b in hashScratch.Slice(0, hashLen))
{
hexScratch[cIndex++] = HexToChar[b >> 4];
hexScratch[cIndex++] = HexToChar[b & 15];
}
input = new string(hexScratch.Slice(0, cIndex));
}
}
Resize(ref buffer, 0);
workWatch.Stop();
Console.WriteLine($"Work took {workWatch.Elapsed.TotalSeconds}s; final hash: {input}");
}
static void Resize(ref byte[] buffer, int length)
{
if (buffer.Length > 0)
{
ArrayPool<byte>.Shared.Return(buffer);
}
buffer = length > 0 ? ArrayPool<byte>.Shared.Rent(length) : Array.Empty<byte>();
}
Upvotes: 7