Jerry Nixon
Jerry Nixon

Reputation: 31813

C# Asynchronous Options for Processing a List

I am trying to better understand the Async and the Parallel options I have in C#. In the snippets below, I have included the 5 approaches I come across most. But I am not sure which to choose - or better yet, what criteria to consider when choosing:

Method 1: Task

(see http://msdn.microsoft.com/en-us/library/dd321439.aspx)

Calling StartNew is functionally equivalent to creating a Task using one of its constructors and then calling Start to schedule it for execution. However, unless creation and scheduling must be separated, StartNew is the recommended approach for both simplicity and performance.

TaskFactory's StartNew method should be the preferred mechanism for creating and scheduling computational tasks, but for scenarios where creation and scheduling must be separated, the constructors may be used, and the task's Start method may then be used to schedule the task for execution at a later time.

// using System.Threading.Tasks.Task.Factory
void Do_1()
{
    var _List = GetList();
    _List.ForEach(i => Task.Factory.StartNew(_ => { DoSomething(i); }));
}

Method 2: QueueUserWorkItem

(see http://msdn.microsoft.com/en-us/library/system.threading.threadpool.getmaxthreads.aspx)

You can queue as many thread pool requests as system memory allows. If there are more requests than thread pool threads, the additional requests remain queued until thread pool threads become available.

You can place data required by the queued method in the instance fields of the class in which the method is defined, or you can use the QueueUserWorkItem(WaitCallback, Object) overload that accepts an object containing the necessary data.

// using System.Threading.ThreadPool
void Do_2()
{
    var _List = GetList();
    var _Action = new WaitCallback((o) => { DoSomething(o); });
    _List.ForEach(x => ThreadPool.QueueUserWorkItem(_Action));
}

Method 3: Parallel.Foreach

(see: http://msdn.microsoft.com/en-us/library/system.threading.tasks.parallel.foreach.aspx)

The Parallel class provides library-based data parallel replacements for common operations such as for loops, for each loops, and execution of a set of statements.

The body delegate is invoked once for each element in the source enumerable. It is provided with the current element as a parameter.

// using System.Threading.Tasks.Parallel
void Do_3()
{
    var _List = GetList();
    var _Action = new Action<object>((o) => { DoSomething(o); });
    Parallel.ForEach(_List, _Action);
}

Method 4: IAsync.BeginInvoke

(see: http://msdn.microsoft.com/en-us/library/cc190824.aspx)

BeginInvoke is asynchronous; therefore, control returns immediately to the calling object after it is called.

// using IAsync.BeginInvoke()
void Do_4()
{
    var _List = GetList();
    var _Action = new Action<object>((o) => { DoSomething(o); });
    _List.ForEach(x => _Action.BeginInvoke(x, null, null));
}

Method 5: BackgroundWorker

(see: http://msdn.microsoft.com/en-us/library/system.componentmodel.backgroundworker.aspx)

To set up for a background operation, add an event handler for the DoWork event. Call your time-consuming operation in this event handler. To start the operation, call RunWorkerAsync. To receive notifications of progress updates, handle the ProgressChanged event. To receive a notification when the operation is completed, handle the RunWorkerCompleted event.

// using System.ComponentModel.BackgroundWorker
void Do_5()
{
    var _List = GetList();
    using (BackgroundWorker _Worker = new BackgroundWorker())
    {
        _Worker.DoWork += (s, arg) =>
        {
            arg.Result = arg.Argument;
            DoSomething(arg.Argument);
        };
        _Worker.RunWorkerCompleted += (s, arg) =>
        {
            _List.Remove(arg.Result);
            if (_List.Any())
                _Worker.RunWorkerAsync(_List[0]);
        };
        if (_List.Any())
            _Worker.RunWorkerAsync(_List[0]);
    }
}

I suppose the obvious critieria would be:

  1. Is any better than the other for performance?
  2. Is any better than the other for error handling?
  3. Is any better than the other for monitoring/feedback?

But, how do you choose? Thanks in advance for your insights.

Upvotes: 35

Views: 10171

Answers (4)

RandomEngy
RandomEngy

Reputation: 15413

Going to take these in an arbitrary order:

BackgroundWorker (#5)
I like to use BackgroundWorker when I'm doing things with a UI. The advantage that it has is having the progress and completion events fire on the UI thread which means you don't get nasty exceptions when you try to change UI elements. It also has a nice built-in way of reporting progress. One disadvantage that this mode has is that if you have blocking calls (like web requests) in your work, you'll have a thread sitting around doing nothing while the work is happening. This is probably not a problem if you only think you'll have a handful of them though.

IAsyncResult/Begin/End (APM, #4)
This is a widespread and powerful but difficult model to use. Error handling is troublesome since you need to re-catch exceptions on the End call, and uncaught exceptions won't necessarily make it back to any relevant pieces of code that can handle it. This has the danger of permanently hanging requests in ASP.NET or just having errors mysteriously disappear in other applications. You also have to be vigilant about the CompletedSynchronously property. If you don't track and report this properly, the program can hang and leak resources. The flip side of this is that if you're running inside the context of another APM, you have to make sure that any async methods you call also report this value. That means doing another APM call or using a Task and casting it to an IAsyncResult to get at its CompletedSynchronously property.

There's also a lot of overhead in the signatures: You have to support an arbitrary object to pass through, make your own IAsyncResult implementation if you're writing an async method that supports polling and wait handles (even if you're only using the callback). By the way, you should only be using callback here. When you use the wait handle or poll IsCompleted, you're wasting a thread while the operation is pending.

Event-based Asynchronous Pattern (EAP)
One that was not on your list but I'll mention for the sake of completeness. It's a little bit friendlier than the APM. There are events instead of callbacks and there's less junk hanging onto the method signatures. Error handling is a little easier since it's saved and available in the callback rather than re-thrown. CompletedSynchronously is also not part of the API.

Tasks (#1)
Tasks are another friendly async API. Error handling is straightforward: the exception is always there for inspection on the callback and nobody cares about CompletedSynchronously. You can do dependencies and it's a great way to handle execution of multiple async tasks. You can even wrap APM or EAP (one type you missed) async methods in them. Another good thing about using tasks is your code doesn't care how the operation is implemented. It may block on a thread or be totally asynchronous but the consuming code doesn't care about this. You can also mix APM and EAP operations easily with Tasks.

Parallel.For methods (#3)
These are additional helpers on top of Tasks. They can do some of the work to create tasks for you and make your code more readable, if your async tasks are suited to run in a loop.

ThreadPool.QueueUserWorkItem (#2)
This is a low-level utility that's actually used by ASP.NET for all requests. It doesn't have any built-in error handling like tasks so you have to catch everything and pipe it back up to your app if you want to know about it. It's suitable for CPU-intensive work but you don't want to put any blocking calls on it, such as a synchronous web request. That's because as long as it runs, it's using up a thread.

async / await Keywords
New in .NET 4.5, these keywords let you write async code without explicit callbacks. You can await on a Task and any code below it will wait for that async operation to complete, without consuming a thread.

Upvotes: 15

lbergnehr
lbergnehr

Reputation: 1598

Reactive extensions is another upcoming library for handling asynchronous programming, especially when it comes to composition of asynchronous events and methods.

It's not native, however it's developed by Ms labs. It's available both for .NET 3.5 and .NET 4.0 and is essentially a collection of extension methods on the .NET 4.0 introduced IObservable<T> interface.

There are a lot of examples and tutorials on their main site, and I strongly recommend checking some of them out. The pattern might seem a bit odd at first (at least for .NET programmers), but well worth it, even if it's just grasping the new concept.

The real strength of reactive extensions (Rx.NET) is when you need to compose multiple asynchronous sources and events. All operators are designed with this in mind and handles the ugly parts of asynchrony for you.

Main site: http://msdn.microsoft.com/en-us/data/gg577609

Beginner's guide: http://msdn.microsoft.com/en-us/data/gg577611

Examples: http://rxwiki.wikidot.com/101samples

That said, the best async pattern probably depends on what situation you're in. Some are better (simpler) for simpler stuff and some are more extensible and easier to handle when it comes to more complex scenarios. I cannot speak for all the ones you're mentioning though.

Upvotes: 2

HasaniH
HasaniH

Reputation: 8402

Your first, third and forth examples use the ThreadPool implicitly because by default Tasks are scheduled on the ThreadPool and the TPL extensions use the ThreadPool as well, the API simply hides some of the complexity see here and here. BackgroundWorkers are part of the ComponentModel namespace because they are meant for use in UI scenarios.

Upvotes: 4

xumix
xumix

Reputation: 643

The last one is the best for 2,3 at least. It has built-in methods/properties for this. Other variants are almost the same, just different versions/convinient wrappers

Upvotes: -2

Related Questions