Correct way to link Tasks together when return values are needed at different times #2

I asked a question yesterday and, unfortunately, even with the answers provided, I'm still hitting up on stumbling blocks about how to do things correctly... My issue is that my code actually works, but I'm a complete novice at concurrency programming and it feels like I'm not programming the correct way and, most importantly, I'm afraid of developing bad habits.

To make up a simplistic example to elaborate on yesterday's question, suppose I had the following methods:

static Task<IEnumerable<MyClass>> Task1(CancellationToken ct)

static Task<IEnumerable<int>> Task2(CancellationToken ct, List<string> StringList)

static Task<IEnumerable<String>> Task3(CancellationToken ct)

static Task<IEnumerable<Double>> Task4(CancellationToken ct)

static Task Task5(CancellationToken ct, IEnumerable<int> Task2Info, IEnumerable<string> Task3Info, IEnumerable<double> Task4Info)

static Task Task6(CancellationToken ct, IEnumerable<int> Task2Info, IEnumerable<MyClass> Task1Info)

And the code I've written that utilizes them looks as follows:

static Task execute(CancellationToken ct)
{
    IEnumerable<MyClass> Task1Info = null;
    List<string> StringList = null;
    IEnumerable<int> Task2Info = null;
    IEnumerable<string> Task3Info = null;
    IEnumerable<double> Task4Info = null;

    var TaskN = Task.Run(() =>
                    {
                        Task1Info = Task1(ct).Result;
                    }
                    , ct)
                    .ContinueWith(res =>
                    {
                        StringList = Task1Info.Select(k=> k.StringVal).ToList();
                        Task2Info = Task2(ct, StringList).Result;
                    }
                    , ct);

    return Task.Run(() =>
    {
        return Task.WhenAll
            (
                TaskN,
                Task.Run(() => { Task3Info = Task3(ct).Result; }, ct),
                Task.Run(() => { Task4Info = Task4(ct).Result; }, ct)
            )
            .ContinueWith(res =>
            {
                Task5(ct, Task2Info, Task3Info, Task4Info).Wait();
            }
            , ct)
            .ContinueWith(res =>
            {
                Task6(ct, Task2Info, Task1Info).Wait();
            }
            , ct);
    });
}

In other words:

I need the results of Task1 to calculate StringList and to run Task2
Task2, Task3 and Task4 can all run concurrently
I need the return values from all of the above for later method calls
Once these are run, I use their results to run Task5
Once Task5 is run, I use all the results in running Task6

As a simple explanation, imagine the first portion is data gathering, the second is data cleansing and the third data reporting

Like I said, my challenge is that this actually runs, but I simply feel that it's more of a "hack" that the right way to program - Concurrency programming is very new to me and I definitely want to learn the best ways this should be done...

Upvotes: 2

Answers (3)

Stephen Cleary

Reputation: 456637

it feels like I'm not programming the correct way and, most importantly, I'm afraid of developing bad habits.

I definitely want to learn the best ways this should be done...

First, distinguish between asynchronous and parallel, which are two different forms of concurrency. An operation is a good match for "asynchronous" if it doesn't need a thread to do its work - e.g., I/O and other logical operations where you wait for something like timers. "Parallelism" is about splitting up CPU-bound work across multiple cores - you need parallelism if your code is maxing out one CPU at 100% and you want it to run faster by using other CPUs.

Next, follow a few guidelines for which APIs are used in which situations. Task.Run is for pushing CPU-bound work to other threads. If your work isn't CPU-bound, you shouldn't be using Task.Run. Task<T>.Result and Task.Wait are even more on the parallel side; with few exceptions, they should really only be used for dynamic task parallelism. Dynamic task parallelism is extremely powerful (as @usr pointed out, you can represent any DAG), but it's also low-level and awkward to work with. There are almost always better approaches. ContinueWith is another example of an API that is for dynamic task parallelism.

Since your method signatures return Task, I'm going to assume that they're naturally asynchronous (meaning: they don't use Task.Run, and are probably implemented with I/O). In that case, Task.Run is the incorrect tool to use. The proper tools for asynchronous work are the async and await keywords.

So, taking your desired behavior one at a time:

// I need the results of Task1 to calculate StringList and to run Task2
var task1Result = await Task1(ct);
var stringList = CalculateStringList(task1Result);

// Task2, Task3 and Task4 can all run concurrently
var task2 = Task2(ct, stringList);
var task3 = Task3(ct);
var task4 = Task4(ct);
await Task.WhenAll(task2, task3, task4);

// I need the return values from all of the above for later method calls
var task2Result = await task2;
var task3Result = await task3;
var task4Result = await task4;

// Once these are run, I use their results to run Task5
await Task5(ct, task2Result, task3Result, task4Result);

// Once Task5 is run, I use all the results in running Task6
await Task6(ct, task2Result, task1Result);

For more about what APIs are appropriate in which situations, I have a Tour of Task on my blog, and I cover a lot of concurrency best practices in my book.

Upvotes: 2

sstan

Reputation: 36483

I would feel better about my answer if I could see how your TaskN methods were implemented. Specifically, I would want to validate the need for your TaskN method calls to be wrapped inside calls to Task.Run() if they are already returning a Task return value.

But personally, I would just use the async-await style of programming. I find it fairly easy to read/write. Documentation can be found here.

I would then envision your code looking something like this:

static async Task execute(CancellationToken ct)
{
    // execute and wait for task1 to complete.
    IEnumerable<MyClass> Task1Info = await Task1(ct);

    List<string> StringList = Task1Info.Select(k=> k.StringVal).ToList();

    // start tasks 2 through 4
    Task<IEnumerable<int>> t2 = Task2(ct, StringList);
    Task<IEnumerable<string>> t3 = Task3(ct);
    Task<IEnmerable<Double>> t4 = Task4(ct);

    // now that tasks 2 to 4 have been started,
    // wait for all 3 of them to complete before continuing.
    IEnumerable<int> Task2Info = await t2;
    IEnumerable<string> Task3Info = await t3;
    IEnumerable<Double> Task4Info = await t4;

    // execute and wait for task 5 to complete
    await Task5(ct, Task2Info, Task3Info, Task4Info);

    // finally, execute and wait for task 6 to complete
    await Task6(ct, Task2Info, Task1Info);
}

Upvotes: 3

usr

Reputation: 171188

I don't fully understand the code in question but it is very easy to link tasks by dependency. Example:

var t1 = ...;
var t2 = ...;
var t3 = Task.WhenAll(t1, t2).ContinueWith(_ => RunT3(t1.Result, t2.Result));

You can express an arbitrary dependency DAG like that. This also makes it unnecessary to store into local variables, although you can do that if you need.

That way you also can get rid of the awkward Task.Run(() => { Task3Info = Task3(ct).Result; }, ct) pattern. This is equivalent to Task3(ct) except for the store to the local variable which probably should not exist in the first place.

.ContinueWith(res =>
        {
            Task5(ct, Task2Info, Task3Info, Task4Info).Wait();
        }

This probably should be

.ContinueWith(_ => Task5(ct, Task2Info, Task3Info, Task4Info)).Unwrap()

Does that help? Leave a comment.

Upvotes: 2

Correct way to link Tasks together when return values are needed at different times #2

Answers (3)

Related Questions