Anatoly
Anatoly

Reputation: 1916

How to improve throughput by parallelizing database access?

I'm collecting data from MongoDB using the following method:

public IEnumerable<TOutput> GetMatchingJobs<TOutput>(JobInfoFilterParameters filterParameters, FetchOptions<JobInfoRecord, TOutput> fetchOptions)
{
    var filter = CreateFilterDefinition(filterParameters);
    var options = CreateFindOptions(fetchOptions, false);

    return MongoDatabaseStorageService.WithRecords<JobInfoRecord, List<TOutput>>
    (collection => collection.FindAsync(filter, options).Result.ToListAsync().Result);
}

/// <summary>
/// Performs an operation on the database, automatically disconnecting and retrying
/// if the database connection drops.
/// </summary>
/// <param name="operation"></param>
public TResult WithRecords<T, TResult>(Func<IMongoCollection<T>, TResult> operation)
{
    try { }
    finally { _lock.EnterUpgradeableReadLock(); }
    try
    {
        return WithRecordsInternal(operation);
    }
    finally
    {
        _lock.ExitUpgradeableReadLock();
    }
}

private TResult WithRecordsInternal<T, TResult>(Func<IMongoCollection<T>, TResult> operation)
{
    try
    {
        return operation(GetCollection<T>());
    }
    catch (IOException)
    {
        // There is an issue in mongo drivers when IOException is thrown instead of reconnection attempt.
        // Try repeat operation, it will force mongo to try to reconnect.
        return operation(GetCollection<T>());
    }
}

I was wondering about using async operation like FindAsync() and ToListAsync() with .Result

How can I improve performance (or throughput by parallelizing database access) by using async-await or what is the correct pattern to use async correcty (if I broke it)?

Upvotes: 0

Views: 105

Answers (2)

davidallyoung
davidallyoung

Reputation: 1332

You could in a sense parallel-ize the database calls, but you will need to use Async-Await from top to bottom. Here's an example of making a set of async calls, capturing their task promises, and waiting until they're all done.

var tasks = new List<Task>();
tasks.Add(AsyncCall1);
tasks.Add(AsyncCall2);

await Task.WhenAll(tasks);

When you call an Async method and you use .Result, you're immediately causing the calling thread to wait for the async method to complete. The above is just a simple example to convey the point that you can put asynchronous operations in parallel. In general, Async-Await is a bit of a virus that tends to spread throughout your code base as the best advantages come from when you use it all the way, top to bottom. Async that just uses .Result isn't really doing anything other than the same operation in a more expensive way than when you were using the synchronous calls.

One other potential option you would want to measure is adding the async calls to the task collection and then calling:

Task.WhenAll(tasks).Wait();

This would cause the calling thread to block on the .Wait, but the tasks in the collection would process in parallel, once all completed the calling thread would proceed. I would personally prefer to use Async-Await all the way, but you may not have that option.

This blog series may be helpful for you. http://blog.stephencleary.com/2012/02/async-and-await.html

Upvotes: 0

usr
usr

Reputation: 171178

You cannot improve database access throughput with async IO. All this does is change the way the call is initiated and completed. The exact same data is transmitted over the network.

You might be able to improve throughput by parallelizing database access but that is independent of async IO.

collection.FindAsync(filter, options).Result.ToListAsync().Result

Here, you are getting the worst possible perf: Higher call overhead due to async, then blocking which is again costly. If this was a good idea, the library would just do this pattern internally for you.

Upvotes: 3

Related Questions