Reputation: 3498
We have a third-party method Foo
which sometimes runs in a deadlock for unknown reasons.
We are executing an single-threaded tcp-server and call this method every 30 seconds to check that the external system is available.
To mitigate the problem with the deadlock in the third party code we put the ping-call in a Task.Run
to so that the server does not deadlock.
Like
async Task<bool> WrappedFoo()
{
var timeout = 10000;
var task = Task.Run(() => ThirdPartyCode.Foo());
var delay = Task.Delay(timeout);
if (delay == await Task.WhenAny(delay, task ))
{
return false;
}
else
{
return await task ;
}
}
But this (in our opinion) has the potential to starve the application of free threads. Since if one call to ThirdPartyCode.Foo
deadlock the thread will never recover from this deadlock and if this happens often enough we might run out of resources.
Is there a general approach how one should handle deadlocking third-party code?
A CancellationToken
won't work because the third-party-api does not provide any cancellation options.
Update: The method at hand is from the SAPNCO.dll provided by SAP to establish and test rfc-connections to a sap-system, therefore the method is not a simple network-ping. I renamed the method in the question to avoid further misunderstandings
Upvotes: 5
Views: 642
Reputation: 456887
Is there a general approach how one should handle deadlocking third-party code?
Yes, but it's not easy or simple.
The problem with misbehaving code is that it can not only leak resources (e.g., threads), but it can also indefinitely hold onto important resources (e.g., some internal "handle" or "lock").
The only way to forcefully reclaim threads and other resources is to end the process. The OS is used to cleaning up misbehaving processes and is very good at it. So, the solution here is to start a child process to do the API call. Your main application can communicate with its child process by redirected stdin/stdout, and if the child process ever times out, the main application can terminate it and restart it.
This is, unfortunately, the only reliable way to cancel uncancelable code.
Upvotes: 4
Reputation: 9360
Cancelling a task is a collaborative operation in that you pass a CancellationToken
to the desired method and externally you use CancellationTokenSource.Cancel
:
public void Caller()
{
try
{
CancellationTokenSource cts=new CancellationTokenSource();
Task longRunning= Task.Run(()=>CancellableThirdParty(cts.Token),cts.Token);
Thread.Sleep(3000); //or condition /signal
cts.Cancel();
}catch(OperationCancelledException ex)
{
//treat somehow
}
}
public void CancellableThirdParty(CancellationToken token)
{
while(true)
{
// token.ThrowIfCancellationRequested() -- if you don't treat the cancellation here
if(token.IsCancellationRequested)
{
// code to treat the cancellation signal
//throw new OperationCancelledException($"[Reason]");
}
}
}
As you can see in the code above , in order to cancel an ongoing task , the method running inside it must be structured around the CancellationToken.IsCancellationRequested
flag or simply CancellationToken.ThrowIfCancellationRequested
method ,
so that the caller just issues the CancellationTokenSource.Cancel
.
Unfortunately if the third party code is not designed around CancellationToken
( it does not accept a CancellationToken
parameter ), then there is not much you can do.
Upvotes: 1
Reputation: 131581
Your code isn't cancelling the blocked operation. Use a CancellationTokenSource and pass a cancellation token to Task.Run
instead :
var cts=new CancellationTokenSource(timeout);
try
{
await Task.Run(() => ThirdPartyCode.Ping(),cts.Token);
return true;
}
catch(TaskCancelledException)
{
return false;
}
It's quite possible that blocking is caused due to networking or DNS issues, not actual deadlock.
That still wastes a thread waiting for a network operation to complete. You could use .NET's own Ping.SendPingAsync to ping asynchronously and specify a timeout:
var ping=new Ping();
var reply=await ping.SendPingAsync(ip,timeout);
return reply.Status==IPStatus.Success;
The PingReply class contains far more detailed information than a simple success/failure. The Status property alone differentiates between routing problems, unreachable destinations, time outs etc
Upvotes: -1