Rogier
Rogier

Reputation: 915

Run tasks in parallel and concatenate the output

I'd like to fetch multiple data providers and they return they same structure of data, but with different data output. At the end the output of the datasources needs to be appended so I can use the total result. To improve performance these datasources need to be called in parallel. I am now having this solution:

Task<List<Result>> dataSource1 = null;
Task<List<Result>> dataSource2 = null;
foreach (var dataSource in dataSourcesToBeFetched)
        {
            switch (dataSource)
            {
                case DataSource.DataSource1:
                    dataSource1 = DataSource1();
                    break;

                case DataSource.DataSource2:
                    dataSource2 =DataSource2();
                    break;
            }
        }
await Task.WhenAll(dataSource1, dataSource2);
var allData = dataSource1.Result.Append(dataSource2.Result)

But I am not happy with it. When adding more data sources, I need to append the new result to the list, which looks ugly. Besides that, I'd like to use switch expressions, but I am struggling with this.

Upvotes: 2

Views: 1013

Answers (2)

Panagiotis Kanavos
Panagiotis Kanavos

Reputation: 131631

All this code can be replaced with :

var results=await Task.WhenAll(DataSource1(),DataSource2());

The Task.WhenAll< TResult>(Task< TResult>[]) method returns a Task< TResult[]> with the results of all async operations.

Once you have the results, you can merge them with Enumerable.SelectMany :

var flattened=results.SelectMany(r=>r).ToList();

While you can combine both operations, it's best to avoid it. This results in code that's hard to read, maintain and debug. During debugging, you'll often want to break after the await to check results for eg nulls or other unexpected values.

The tasks and flattening run on different threads, which makes debugging with the chained calls harder.

If you really need to, you can use ContinueWith after WhenAll to process the results in a threadpool thread before returning them:

var flatten=await Task.WhenAll(DataSource1(),DataSource2())
                      .ContinueWith(t=>t.Results.SelectMany(r=>r)
                                        .ToList());

Update

To filter the sources, a quick & dirty way would be to create a Dictionary that maps source IDs to methods and use LINQ's Select to pick them :

//In a field
Dictionary<DataSource,Func<Task<List<Result>>>> map=new (){
    [DataSource.Source1]=DataSource1,
    [DataSource.Source1]=DataSource2
};

//In the method
DataSource[] fetchSources=new DataSource[0];
var tasks=fetchSources.Select(s=>map[s]());

But that's little different from using a function to do the same job :

DataSource[] fetchSources=new DataSource[0];
var tasks=fetchSources.Select(s=>RunSource(s));
//or even 
//var tasks=fetchSources.Select(RunSource);
    
var results=await Task.WhenAll(tasks);
var flattened=results.SelectMany(r=>r).ToList();


public static Task<List<Result>> RunSource(DataSource source)
{
    return source switch {
            DataSource.Source1=> DataSource1(),
            DataSource.Source2=> DataSource2(),
            _=>throw new ArgumentOutOfRangeException(nameof(source))
    };
}

Upvotes: 0

Jeroen van Langen
Jeroen van Langen

Reputation: 22073

A problem in your code is, that if the DataSource.DataSource1 is not present in the dataSourcesToBeFetched, you are awaiting a null task.

I would probably go for a collection of tasks to await.

Something like:

var dataSources = new List<Task<List<Result>>>();

// check if the DataSource1 is present in the dataSourcesToBeFetched
if(dataSourcesToBeFetched.Any(i => i == DataSource.DataSource1))
    dataSources.Add(DataSource1());

// check if the DataSource2 is present in the dataSourcesToBeFetched
if(dataSourcesToBeFetched.Any(i => i == DataSource.DataSource2))
    dataSources.Add(DataSource2());

// a list to hold all results
var allData = new List<Result>();

// if we need to fetch any, await all tasks.
if(dataSources.Count > 0)
{
    await Task.WhenAll(dataSources);

    // add the results to the list.
    foreach(var dataSource in dataSources)
        allData.AddRange(dataSource.Result);
}

Upvotes: 1

Related Questions