Reputation: 247
I want to process a list of 5000 items. For each item, the process can be very quick (1sec) or take much time (>1min). But I want to process this list the fastest possible way.
I can't put this 5000 items in the .NET ThreadPool, plus I need to know when the items are all processed, so I was thinking to have a specific number of Threads and to do:
foreach(var item in items)
{
// wait for a Thread to be available
// give the item to process to the Thread
}
but what is the easiest way to do that in c#? Should I use Threads, or are there some higher level classes that I could use?
Upvotes: 1
Views: 638
Reputation: 29207
I agree with the answers recommending Parallel.ForEach
. Without knowing all of the specifics (like what's going on in the loop) I can't say 100%. As long as the iterations in the loop aren't doing anything that conflict with each other (like concurrent operations with some other object that aren't thread safe) then it should be fine.
You mentioned in a comment that it's throwing an exception. That can be a problem because if one iteration throws an exception then the loop will terminate leaving your tasks only partially complete.
To avoid that, handle exceptions within each iteration of the loop. For example,
var exceptions = new ConcurrentQueue<Exception>();
Parallel.ForEach(items, i =>
{
try
{
//Your code to do whatever
}
catch(Exception ex)
{
exceptions.Enqueue(ex);
}
});
By using a ConcurrentQueue
any iteration can safely add its own exception. When it's done you have a list of exceptions. Now you can decide what to do with them. You could throw a new exception:
if (exceptions.Count > 0) throw new AggregateException(exceptions);
Or if there's something that uniquely identifies each item
you could do (for example)
var exceptions = new ConcurrentDictionary<Guid, Exception>();
And then when an exception is thrown,
exceptions.TryAdd(item.Id, ex); //making up the Id property
Now you know specifically which items succeeded and which failed.
Upvotes: 0
Reputation: 5735
to do parallel processing this is the structure to use
Parallel.ForEach(items, (item) =>
{
....
}
and if you want not to overload the thread pool you can use ParallelOptions
var po = new ParallelOptions
{
MaxDegreeOfParallelism = 5
}
Parallel.ForEach(items, po,(item) =>
{
....
}
Upvotes: 2
Reputation: 150108
I would start with Parallel.ForEach and measure your performance. That is a simple, powerful approach and the scheduling does a pretty decent job for a generic scheduler.
Parallel.ForEach(items, i => { /* your code here */ });
I can't put this 5000 items in the .NET ThreadPool
Nor would you want to. It is relatively expensive to create a thread. Context switches take time. If you had say 8 cores processing 5000 threads, a meaningful fraction of your execution time would be context switches.
Upvotes: 2