Get ALL stacktraces in async/await application

Question

I want to get info about all call stacks (or get all stacktraces) in my asynchronous C# application. I know, how to get stacktraces of all existing threads.

But how to get info about all call stacks released by await, which do not have a running thread on it?

CONTEXT EXAMPLE

Suppose the following code:

private static async Task Main()
{
    async Task DeadlockMethod(SemaphoreSlim lock1, SemaphoreSlim lock2)
    {
        await lock1.WaitAsync();
        await Task.Delay(500);
        await lock2.WaitAsync(); // this line causes the deadlock
    }

    SemaphoreSlim lockA = new SemaphoreSlim(1);
    SemaphoreSlim lockB = new SemaphoreSlim(1);

    Task call1 = Task.Run(() => DeadlockMethod(lockA, lockB));
    Task call2 = Task.Run(() => DeadlockMethod(lockB, lockA));

    Task waitTask = Task.Delay(1000);

    await Task.WhenAny(call1, call2, waitTask);

    if (!call1.IsCompleted
        && !call2.IsCompleted)
    {
        // DUMP STACKTRACES to find the deadlock
    }
}

I would like to dump all stacktraces, even those not having its thread currently, so that I can find the deadlock.

If line await lock2.WaitAsync(); is changed to lock2.Wait();, then it would be possible by already mentioned get stacktraces of all threads. But how to list all stacktraces without a running thread?

PREVENTION OF MISUNDERSTANDING:

The example is very simplified, it just ilustrates one of potential complications. The original problem is a complex multithreaded application, which runs on a server and many hard-to-investigate parallel-related issues may happen.
We would use the list of async/await stacktraces not only to find deadlocks, but also for other purposes. Therefore please do not advice me how to avoid deadlocks or how to write a multithreaded application - that is not the point of the question.
You can answer this generally, but also solution working at least on .Net Core 3.1 is enough.

Stephen Cleary · Accepted Answer

I know, how to get stacktraces of all existing threads.

Just gonna give a bit of background here.

In Windows, threads are an OS concept. They're the unit of scheduling. So there's a definite list of threads somewhere, since that's what the OS scheduler uses.

Furthermore, each thread has a call stack. This dates back to the early days of computer programming. However, the purpose of the call stack is often misunderstood. The call stack is used as a sequence of return locations. When a method returns, it pops its call stack arguments off the stack and also the return location, and then jumps to the return location.

This is important to remember because the call stack does not represent how code got into a situation; it represents where code is going it returns from the current method. The call stack is where the code is going to, not where it came from. That is the reason the call stack exists: to direct the future code, not to assist diagnostics. Now, it does turn out that the call stack does have useful information on it for diagnostics since it gives an indication of where the code came from as well as where it's going, so that's why call stacks are on exceptions and are commonly used for diagnostics. But that's not the actual reason why the call stack exists; it's just a happy circumstance.

Now, enter asynchronous code.

In asynchronous code, the call stack still represents where the code is returning to (just like all call stacks). But in asynchronous code, the call stack no longer represents where the code came from. In the synchronous world, these two things were the same, and the call stack (which is necessary) can also be used to answer the question of "how did this code get here?". In the asynchronous world, the call stack is still necessary but only answers the question "where is this code going?" and cannot answer the question "how did this code get here?". To answer the "how did this code get here?" question you need a causality chain.

Furthermore, call stacks are necessary for correct operation (in both the synchronous and asynchronous worlds), and so the compiler/runtime ensures they exist. Causality chains are not necessary, and they are not provided out of the box. In the synchronous world, the call stack just happens to be a causality chain, which is nice, but that happy circumstance doesn't carry over to the asynchronous world.

When a thread is released by await, the stacktrace and all objects along the call stack are stored somewhere.

No; this is not the case. This would be true if async used fibers, but it doesn't. There is no call stack saved anywhere.

Because otherwise the continuation thread would lose context.

When an await resumes, it only needs sufficient context to continue executing its own method, and potentially completing the method. So, there is an async state machine structure that is boxed and placed on the heap; this structure contains references to local variables (including this and method arguments). But that is all that is necessary for program correctness; a call stack is not necessary and so it is not stored.

You can easily see this yourself by setting a breakpoint after an await and observing the call stack. You'll see that the call stack is gone after the first await yields. Or - more properly - the call stack represents the code that is continuing the async method, not the code that originally started the async method.

At the implementation level, async/await is more like callbacks than anything else. When a method hits an await, it sticks its state machine structure on the heap (if it hasn't already) and wires up a callback. That callback is triggered (invoked directly) when the task completes, and that continues executing the async method. When that async method completes, it completes its tasks, and anything awaiting those tasks are then invoked to continue executing. So, if a whole sequence of tasks complete, you actually end up with a call stack that is an inversion of the causality stack.

I would like to dump all stacktraces, even those not having its thread currently, so that I can find the deadlock.

So, there's a couple of problems here. First, there is no global list of all Task objects (or more generally, tasklike objects). And that would be a difficult thing to get.

Second, for each asynchronous method/task, there's no causality chain anyway. The compiler doesn't generate one because it's not necessary for correct operation.

That's not to say either of these problems are insurmountable - just difficult. I've done some work on the causality chain problem with my AsyncDiagnostics library. It's rather old at this point but should upgrade pretty easily to .NET Core. It uses PostSharp to modify the compiler-generated code for each method and manually track causality chains.

However, the goal of AsyncDiagnotics is to get causality chains onto exceptions. Getting a list of all tasklikes and associating causality chains with each one is another problem, likely requiring the use of an attached profiler. I'm aware of other companies who have wanted this solution, but none of them have dedicated the time necessary to create one; all of them have found it more efficient to implement code reviews, auditing, and developer training.

Get ALL stacktraces in async/await application

Answers (2)

Related Questions