AstroSharp
AstroSharp

Reputation: 1902

Correct way to a-synchronize parallel tasks

Currently we have this code which works fine:

Result result1 = null;
Result result2 = null;

var task1 = Task.Factory.StartNew(()=>
{
   var records = DB.Read("..");
   //Do A lot
   result1 = Process(records);  
}); 

var task2 = Task.Factory.StartNew(()=>
{
   var records = DB.Read(".....");
   //Do A lot
   result2 = Process(records);  
});

Task.WaitAll(task1, task2);

var result = Combine(result1, result2);

Now we would like to use async counterparts of DB Functions and we are using this new pattern:

Result result1 = null;
Result result2 = null;

var task1 = await Task.Factory.StartNew( async ()=>
{
   var records = await DB.ReadAsync("..");
   //Do A lot
   result1 = Process(records);  
}); 

var task2 = await Task.Factory.StartNew(async ()=>
{
   var records = await DB.ReadAsync(".....");
   //Do A lot
   result2 = Process(records);  
});

Task.WaitAll(task1, task2);

var result = Combine(result1, result2);

After we switched to async we started observing abnormal behavior. So I wonder if this is the correct pattern to parallelize async calls ?

Upvotes: 2

Views: 2997

Answers (3)

Tigran
Tigran

Reputation: 62265

Task.Factory.StartNew start a new Task of execution another independent execution unit. So the simplest way to deal with that may look like:

var task1 = Task.Factory.StartNew(()=> //NO AWAIT
{
   var records = DB.Read("....."); //NO ASYNC
   //Do A lot
   result1 = Process(records);  
});

... another task definition

Task.WaitAll(task1, task2);

Read and process sequentially in one task, as you have data dependency.

Upvotes: 0

norekhov
norekhov

Reputation: 4318

Well using .WaitAll is not an async programming because you're actually blocking current thread on waiting. Also you dont call .Unwrap and that's why you just wait only on creating of async lambda, not on async lambda itself.

Task.Run can unwrap async lambda for you. But there's a simpler and cleaner way.

var task1 = DB.ReadAsync("..").ContinueWith(task => {
   //Do A lot
   return Process(task.Result);  
}, TaskScheduler.Default);

var task2 = DB.ReadAsync("..").ContinueWith(task => {
   //Do A lot
   return Process(task.Result);  
}, TaskScheduler.Default);

var result = Combine(await task1, await task2);

In this way you will get result exactly when it's ready. So you don't need additional tasks and variables at all.

Please note that ContinueWith is a tricky function and it works on TaskScheduler.Current if it is not null and otherwise it works on TaskScheduler.Default which is thread pool scheduler. So it's safer to specify scheduler explicitly always when calling this function.

Also for claryfing I didn't included error checking because actually DB.ReadAsync can be completed with an error. But that's an easy thing and you can handle it for yourself.

Upvotes: 0

i3arnon
i3arnon

Reputation: 116636

Task.Factory.StartNew is a pre-async API. You should be using Task.Run which was designed with async-await in mind:

var task1 = await Task.Run( async ()=>
{
   var records = await DB.ReadAsync("..");
   //Do A lot
   result1 = Process(records);  
});

The issue is that an async lambda returns a Task so Task.Factory.StartNew returns a Task<Task> (the outer one because Task.Factory.StartNew returns a Task and the inner one which is the result of the async lambda).

This means that when you wait on task1 and task2 you aren't really waiting for the entire operation, just the synchronous part of it.

You can fix that by using Task.Unwrap on the returned Task<Task>:

Task<Task> task1 = await Task.Factory.StartNew(async ()=>
{
   var records = await DB.ReadAsync("..");
   //Do A lot
   result1 = Process(records);  
});

Task actualTask1 = task1.Unwrap();
await actualTask1;

But Task.Run does that implicitly for you.


As a side note, you should realize that you don't need Task.Run to execute these operations concurrently. You can do that just by calling these methods and awaiting the results together with Task.When:

async Task MainAsync()
{
    var task1 = FooAsync();
    var task2 = BarAsync();
    await Task.WhenAll(task1, task2);

    var result = Combine(task1.Result, task2.Result);
}

async Task<Result> FooAsync()
{
    var records = await DB.ReadAsync("..");
    //Do A lot
    return Process(records);  
}

async Task<Result> BarAsync()
{
    var records = await DB.ReadAsync(".....");
    //Do A lot
    return Process(records);
}

You only need Task.Run if you need to offload even the synchronous parts of these methods (the part before the first await) to the ThreadPool.

Upvotes: 3

Related Questions