Ackdari
Ackdari

Reputation: 3498

How to handle a deadlock in third-party code

We have a third-party method Foo which sometimes runs in a deadlock for unknown reasons.

We are executing an single-threaded tcp-server and call this method every 30 seconds to check that the external system is available.

To mitigate the problem with the deadlock in the third party code we put the ping-call in a Task.Run to so that the server does not deadlock.

Like

async Task<bool> WrappedFoo()
{
    var timeout = 10000; 

    var task = Task.Run(() => ThirdPartyCode.Foo());
    var delay = Task.Delay(timeout);

    if (delay == await Task.WhenAny(delay, task ))
    {
        return false;
    }
    else
    {
        return await task ;
    }
}

But this (in our opinion) has the potential to starve the application of free threads. Since if one call to ThirdPartyCode.Foo deadlock the thread will never recover from this deadlock and if this happens often enough we might run out of resources.

Is there a general approach how one should handle deadlocking third-party code?

A CancellationToken won't work because the third-party-api does not provide any cancellation options.

Update: The method at hand is from the SAPNCO.dll provided by SAP to establish and test rfc-connections to a sap-system, therefore the method is not a simple network-ping. I renamed the method in the question to avoid further misunderstandings

Upvotes: 5

Views: 642

Answers (3)

Stephen Cleary
Stephen Cleary

Reputation: 456887

Is there a general approach how one should handle deadlocking third-party code?

Yes, but it's not easy or simple.

The problem with misbehaving code is that it can not only leak resources (e.g., threads), but it can also indefinitely hold onto important resources (e.g., some internal "handle" or "lock").

The only way to forcefully reclaim threads and other resources is to end the process. The OS is used to cleaning up misbehaving processes and is very good at it. So, the solution here is to start a child process to do the API call. Your main application can communicate with its child process by redirected stdin/stdout, and if the child process ever times out, the main application can terminate it and restart it.

This is, unfortunately, the only reliable way to cancel uncancelable code.

Upvotes: 4

Bercovici Adrian
Bercovici Adrian

Reputation: 9360

Cancelling a task is a collaborative operation in that you pass a CancellationToken to the desired method and externally you use CancellationTokenSource.Cancel:

public void Caller()
{
     try
     {
          CancellationTokenSource cts=new CancellationTokenSource();
          Task longRunning= Task.Run(()=>CancellableThirdParty(cts.Token),cts.Token);
          Thread.Sleep(3000); //or condition /signal
          cts.Cancel();
     }catch(OperationCancelledException ex)
     {
          //treat somehow
     }
    
}
public void CancellableThirdParty(CancellationToken token)
{
    while(true)
    {
        // token.ThrowIfCancellationRequested()  -- if you  don't treat the cancellation here
        if(token.IsCancellationRequested)
        {
           // code to treat the cancellation signal
           //throw new OperationCancelledException($"[Reason]");
        }
    }
}

As you can see in the code above , in order to cancel an ongoing task , the method running inside it must be structured around the CancellationToken.IsCancellationRequested flag or simply CancellationToken.ThrowIfCancellationRequested method , so that the caller just issues the CancellationTokenSource.Cancel.

Unfortunately if the third party code is not designed around CancellationToken ( it does not accept a CancellationToken parameter ), then there is not much you can do.

Upvotes: 1

Panagiotis Kanavos
Panagiotis Kanavos

Reputation: 131581

Your code isn't cancelling the blocked operation. Use a CancellationTokenSource and pass a cancellation token to Task.Run instead :

var cts=new CancellationTokenSource(timeout);

try
{
    await Task.Run(() => ThirdPartyCode.Ping(),cts.Token);
    return true;
}
catch(TaskCancelledException)
{
    return false;
}

It's quite possible that blocking is caused due to networking or DNS issues, not actual deadlock.

That still wastes a thread waiting for a network operation to complete. You could use .NET's own Ping.SendPingAsync to ping asynchronously and specify a timeout:

var ping=new Ping();

var reply=await ping.SendPingAsync(ip,timeout);
return reply.Status==IPStatus.Success;

The PingReply class contains far more detailed information than a simple success/failure. The Status property alone differentiates between routing problems, unreachable destinations, time outs etc

Upvotes: -1

Related Questions