Nardu
Nardu

Reputation: 368

await on async call throws unexpected Timeout exception

I'm executing a series of async calls on a Service Fabric application and there is a long-running call that will throw a TimeoutException after 5-10 minutes. My code is similar to this:

public class Listener {
    private async Task HandleRequestAsync(RestoreRequest request, RestoreWorker worker) {
        Response response = await worker.ExecuteAsync(request).ConfigureAwait(false);
    }
}


public class RestoreWorker {

    public async Task<Response> ExecuteAsync(RestoreRequest request) {
        RestoreService restoreService = new restoreService(request);
        restoreService.Progress.ProgressChanged += async (sender, info) => await request.UpdateStatusAsync(new State(StateEnum.Running) { ProgressCurrent = info.Current, ProgressTotal = info.Total }).ConfigureAwait(false);
        await restoreService.RestoreAsync(request.Id, request.Name).ConfigureAwait(false);
        return new Response();
    }

    public Progress<ProgressInfo> Progress { get; } = new Progress<ProgressInfo>();
}

public class RestoreRequest {
    public async Task UpdateStatusAsync(Status status) {
        Message message = new Message { Status = status };
        await sender.SendAsync(message).ConfigureAwait(false);
    }
}

public class RestoreService {

    private static readonly IRestoreClient restoreClient =  ServiceProxyFactory.CreateServiceProxy<IRestoreClient>(new Uri($"{FabricConfig.ApplicationName}/RestoreClient"));

    private async Task <Project> GetProjectByNameAsync(string name){
    //return the project
    }

    private async Task RestoreAsync(string id, string name) {
        await restoreClient.RestoreAsync(id, name).ConfigureAwait(false);
    }
}

public class RestoreClient : IRestoreClient {
    private async Task RestoreAsync(string id, string name) {
        Project project = await GetProjectByNameAsync(name).ConfigureAwait(false);
        project = await UpdateDbAsync(project.Id).ConfigureAwait(false);

        if (project == null) {
            throw new Exception("Could not find project.");
        }
    }

    private async Task UpdateDbAsync(string id) {
        try {
            List<string> input = CreateScripts();
            await ExecuteScriptsOnDbAsync(input).ConfigureAwait(false);
        } catch (SqlException) {
            throw new Exception($"Project with id: '{id}'  could not be created.");
        }
    }

    private async Task ExecuteScriptsOnDbAsync(List<string> scripts) {
        using (var conn = new SqlConnection(connectionString)) {
            try {
                await conn.OpenAsync().ConfigureAwait(false);
                using (var sqlCommand = new SqlCommand { Connection = conn }) {
                    sqlCommand.CommandTimeout = SqlCommandCommandTimeout;
                    foreach (string script in scripts) {
                        sqlCommand.CommandText = script;
                        await sqlCommand.ExecuteNonQueryAsync().ConfigureAwait(false);
                    }
                }
            } catch (SqlException ex) {
                Log.Fatal(ex, $"Cannot execute script on {Name}");
                throw;
            }
        }
    }
}

If the method UpdateTheDBAsync takes long to execute I will receive a TimeoutException

System.AggregateException: One or more errors occurred. ---> System.TimeoutException: This can happen if message is dropped when service is busy or its long running operation and taking more time than configured Operation Timeout.

at Microsoft.ServiceFabric.Services.Communication.Client.ServicePartitionClient`1.<InvokeWithRetryAsync>d__24`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ServiceFabric.Services.Remoting.V1.Client.ServiceRemotingPartitionClient.<InvokeAsync>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ServiceFabric.Services.Remoting.Builder.ProxyBase.<InvokeAsync>d__15.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ServiceFabric.Services.Remoting.Builder.ProxyBase.<ContinueWithResult>d__16`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at RestoreService.<RestoreAsync>d__14.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()
at RestoreWorker.<ExecuteAsync>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at Listener.<HandleRequestAsync>d__15.MoveNext()

Why am I getting a timeout even though no timeout is configured? What am I doing wrong? Any help is appreciated.

PS: This very same code used to work

Upvotes: 0

Views: 1923

Answers (1)

Nardu
Nardu

Reputation: 368

The problem is related to the default timeout of 5 minutes of the remoting (ServiceFabric.Services.Remoting) between services.

Version 2 of the remoting is available and according to Microsoft documentation "The remoting V2 stack performs better".

After upgrading to V2 one possible way of solving the problem is to increase the timeout

 new ServiceProxyFactory((c) => new FabricTransportServiceRemotingClientFactory(
                                       new FabricTransportRemotingSettings() {
                                           OperationTimeout = TimeSpan.FromMinutes(30)
                                       })))

But this will just increase the timeout and not remove it completely.

A different way to solve it would be to start a worker that is handled directly in the service that was used with the remoting and wait for its completion. In this way, the solution is not bound to the remoting timeout.

For example:

replacing this:

await restoreClient.RestoreAsync(id, name).ConfigureAwait(false);

with

var workerId = StartANewWorker()
JobState jobState;
do {
    //poll for the status of the new worker
    var workerStatus = GetStatusOfTheWorker(workerId);

    await Task.Delay(1000).ConfigureAwait(false);
    if (workerStatus == Failed) {
        throw new Exception("Something went wrong");
    }
} while (workerStatus != Finished);

Upvotes: 1

Related Questions