Reputation: 3314
I have C# list which contains around 8000 items (file paths). I want to run a method on all of these items in parallel. For this i have below 2 options:
1) Manually divide list into small-small chunks (say of 500 size each) and create array of actions for these small lists and then call Parallel.Invoke like below:
var partitionedLists = MainList.DivideIntoChunks(500);
List<Action> actions = new List<Action>();
foreach (var lst in partitionedLists)
{
actions.Add(() => CallMethod(lst));
}
Parallel.Invoke(actions.ToArray())
2) Second option is to run Parallel.ForEach like below
Parallel.ForEach(MainList, item => { CallMethod(item) });
Please suggest, thanks in advance.
Upvotes: 1
Views: 2592
Reputation: 43495
The second option is preferable:
ParallelOptions options = new()
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
Partitioner<Item> partitioner = Partitioner.Create(MainList);
Parallel.ForEach(partitioner, options, item =>
{
CallMethod(item);
});
The main advantage is that in case the CallMethod
fails for an item
, the parallel operation will complete as soon as possible. You won't have to wait for all items in the MainList
to be invoked, before getting the exception. On the contrary The Parallel.Invoke
invokes invariably all the actions.
The Partitioner.Create
is useful if you want to process the MainList
in roughly ascending order. Alternatively you can pass the MainList
directly as the source of the Parallel.ForEach
, in which case the order of execution will be a bit chaotic. The default partitioning for List<T>
s is range partitioning, meaning that each worker thread will be assigned a different large range of the list, and process incrementally this range. With the Partitioner.Create
you get chunk partitioning. This means that each worker thread grabs a small chunk of items from the list, processes them, and then comes back to retrieve the next chunk etc.
Upvotes: 0
Reputation: 1517
The first option is a form of task-parallelism
, in which you divide your task into group of sub-tasks and execute them in parallel. As is obvious from the code you provided, you are responsible for choosing the level of granularity [chunks] while creating the sub-tasks. The selected granularity might be too big or too low, if one does not rely on appropriate heuristics, and the resulting performance gain might not be significant. Task-parallelism
is used in scenarios where the operation to be performed takes similar time for all input values.
The second option is a form of data-parallelism
, in which the input data is divided into smaller chunks based on the number of hardware threads/cores/processors available, and then each individual chunk is processed in isolation. In this case, the .NET library chooses the right level of granularity for you and ensures better CPU utilization. Conventionally, data-parallelism
is used in scenarios when the operation to be performed can vary in terms of time taken, depending on the input value.
In conclusion, if your operation is more or less uniform over the range of input values and you know the right granularity [chunk size], go ahead with the first option. If however that's not the case or if you are unsure about the above questions, go with the second option which usually pans out better in most scenarios.
NOTE: If this is a very performance critical component of your application, I will advise bench-marking the performances in production like environment with both approaches to get more data, in addition to the above recommendations.
Upvotes: 4