Reputation: 593
I have an application which converts some data often there are 1.000 - 30.000 files.
I need to do 3 steps:
So all three steps include some I/O and I used async/await methods:
var tasks = files.Select(async (file) =>
{
Item item = await createtempFile(file).ConfigureAwait(false);
await convert(item).ConfigureAwait(false);
await clean(item).ConfigureAwait(false);
}).ToList();
await Task.WhenAll(tasks).ConfigureAwait(false);
I don´t know if this is the best practice, because I create more than thousand tasks. I thought about splitting the three steps like:
List<Item> items = new List<Item>();
var tasks = files.Select(async (file) =>
{
Item item = await createtempFile(file, ext).ConfigureAwait(false);
lock(items)
items.Add(item);
}).ToList();
await Task.WhenAll(tasks).ConfigureAwait(false);
var tasks = items.Select(async (item) =>
{
await convert(item, baseAddress, ext).ConfigureAwait(false);
}).ToList();
await Task.WhenAll(tasks).ConfigureAwait(false);
var tasks = items.Select(async (item) =>
{
await clean(targetFile, item.Doctype, ext).ConfigureAwait(false);
}).ToList();
await Task.WhenAll(tasks).ConfigureAwait(false);
But that doesn´t seem to be better or faster, because I create 3 times thousands of tasks.
Should I throttle the creation of tasks? Like chunks of 100 tasks? Or am I just overthinking it and the creation of thousands of tasks is just fine.
The CPU is idling with 2-4% peak, so I thought about too many awaits or context switches.
Maybe the WebRequest calls are too many, because the WebServer/WebService can´t handle thousands of Requests simultaneously and I should only throttle the WebRequests?
I already increased the .NET maxconnection in the app.config file.
Upvotes: 3
Views: 3390
Reputation: 1511
It is possible to execute async operations in parallel with limiting the number of concurrent operations. There is a cool extension method for that, it is not part of the .Net framework.
/// <summary>
/// Enumerates a collection in parallel and calls an async method on each item. Useful for making
/// parallel async calls, e.g. independent web requests when the degree of parallelism needs to be
/// limited.
/// </summary>
public static Task ForEachAsync<T>(this IEnumerable<T> source, int degreeOfParallelism, Func<T, Task> action)
{
return Task.WhenAll(Partitioner.Create(source).GetPartitions(degreeOfParalellism).Select(partition => Task.Run(async () =>
{
using (partition)
while (partition.MoveNext())
await action(partition.Current);
})));
}
Call it like this:
var files = new List<string> {"one", "two", "three"};
await files.ForEachAsync(5, async file =>
{
// do async stuff here with the file
await Task.Delay(1000);
});
Upvotes: 15
Reputation: 457127
As commenters have correctly noted, you're overthinking it. The .NET runtime has absolutely no problem tracking thousands of tasks.
However, you might want to consider using a TPL Dataflow pipeline, which would enable you to easily have different concurrency levels for different operations ("blocks") in your pipeline.
Upvotes: 7