Daniel Schröder
Daniel Schröder

Reputation: 11

Could you implement async-await by memcopying stack frames rather than creating state machines?

I am trying to understand all the low-level stuff Compilers / Interpreters / the Kernel do for you (because I'm yet another person who thinks they could design a language that's better than most others)

One of the many things that sparked my curiosity is Async-Await. I've checked the under-the-hood implementation for a couple languages, including C# (the compiler generates the state machine from sugar code) and Rust (where the state machine has to be implemented manually from the Future trait), and they all implement Async-Await using state machines. I've not found anything useful by googling ("async copy stack frame" and variations) or in the "Similar questions" section.

To me, this method seems rather complicated and overhead-heavy;

Could you not implement Async-Await by simply memcopying the stack frames of async calls to/from heap?

I'm aware that it is architecturally impossible for some languages (I thank the CLR can't do it, so C# can't either).

Am I missing something that makes this logically impossible? I would expect less complicated code and a performance boost from doing it that way, am I mistaken? I suppose when you have a deep stack hierarchy after a async call (eg. a recursive async function) the amount of data you would have to memcopy is rather large, but there are probably ways to work around that.

If this is possible, then why isn't it done anywhere?

Upvotes: 0

Views: 410

Answers (3)

frodeborli
frodeborli

Reputation: 1657

I have been looking into various strategies for doing this myseøf, because I naturally thi k I can design a language better than anybody else - same as you. I just want to emphasize that when I say better, I actually mean better as in tastes better for my liking, and not objectively better.

I have come to a few different approaches, and to summarize: It really depends on many other design choices you have made in the language.

It is all about compromises; each approach has advantages and disadvantages.

It feels like the compiler design community are still very focused on garbage collection and minimizing memory waste, and perhaps there is room for some innovation for more lazy and less purist language designers given the vast resources available to modern computers?

How about not having a call stack at all?

It is possible to implement a language without using a call stack.

  1. Pass continuations. The function currently running is responsible for keeping and resuming the state of the caller. Async/await and generators come naturally.

  2. Preallocated static memory addresses for all local variables in all declared functions in the entire program. This approach causes other problems, of course.

If this is your design, then asymc functions seem trivial

Tree shaped stack

With a tree shaped stack, you can keep all stack frames until the function is completely done. It does not matter if you allow progress on any ancestor stack frame, as long as you let the async frame live on until it is no longer needed.

Linear stack

How about serializing the function state? It seems like a variant of continuations.

Independent stack frames on the heap

Simply treat invocations like you treat other pointers to any value on the heap.

All of the above are trivialized approaches, but one thing they have in common related to your question:

Just find a way to store any locals needed to resume the function. And don't forget to store the program counter in the stack frame as well.

Upvotes: 0

Sai Gummaluri
Sai Gummaluri

Reputation: 1394

I would first like to begin by saying that this answer is only meant to serve as a starting point to go in the actual direction of your exploration. This includes various pointers and building up on the work of various other authors

I've checked the under-the-hood implementation for a couple languages, including C# (the compiler generates the state machine from sugar code) and Rust (where the state machine has to be implemented manually from the Future trait), and they all implement Async-Await using state machines

You understood correctly that the Async/Await implementation for C# and Rust use state machines. Let us understand now as to why are those implementations chosen.

To put the general structure of stack frames in very simple terms, whatever we put inside a stack frame are temporary allocations which are not going to outlive the method which resulted in the addition of that stack frame (including, but not limited to local variables). It also contains the information of the continuation, ie. the address of the code that needs to be executed next (in other words, the control has to return to), within the context of the recently called method. If this is a case of synchronous execution, the methods are executed one after the other. In other words, the caller method is suspended until the called method finishes execution. This, from a stack perspective fits in intuitively. If we are done with the execution of a called method, the control is returned to the caller and the stack frame can be popped off. It is also cheap and efficient from a perspective of the hardware that is running this code as well (hardware is optimised for programming with stacks).

In the case of asynchronous code, the continuation of a method might have to trigger several other methods that might get called from within the continuation of callers. Take a look at this answer, where Eric Lippert outlines the entirety of how the stack works for an asynchronous flow. The problem with asynchronous flow is that, the method calls do not exactly form a stack and trying to handle them like pure stacks may get extremely complicated. As Eric says in the answer, that is why C# uses graph of heap-allocated tasks and delegates that represents a workflow.

However, if you consider languages like Go, the asynchrony is handled in a different way altogether. We have something called Goroutines and here is no need for await statements in Go. Each of these Goroutines are started on their own threads that are lightweight (each of them have their own stacks, which defaults to 8KB in size) and the synchronization between each of them is achieved through communication through channels. These lightweight threads are capable of waiting asynchronously for any read operation to be performed on the channel and suspend themselves. The earlier implementation in Go is done using the SplitStacks technique. This implementation had its own problems as listed out here and replaced by Contigious Stacks. The article also talks about the newer implementation.

One important thing to note here is that it is not just the complexity involved in handling the continuation between the tasks that contribute to the approach chosen to implement Async/Await, there are other factors like Garbage Collection that play a role. GC process should be as performant as possible. If we move stacks around, GC becomes inefficient because accessing an object then would require thread synchronization.

Could you not implement Async-Await by simply memcopying the stack frames of async calls to/from heap?

In short, you can. As this answer states here, Chicken Scheme uses a something similar to what you are exploring. It begins by allocating everything on the stack and move the stack values to heap when it becomes too large for the GC activities (Chicken Scheme uses Generational GC). However, there are certain caveats with this kind of implementation. Take a look at this FAQ of Chicken Scheme. There is also lot of academic research in this area (linked in the answer referred to in the beginning of the paragraph, which I shall summarise under further readings) that you may want to look at.

Further Reading

Continuation Passing Style

call-with-current-continuation

The classic SICP book

This answer (contains few links to academic research in this area)

TLDR

The decision of which approach to be taken is subjective to factors that affect the overall usability and performance of the language. State Machines are not the only way to implement the Async/Await functionality as done in C# and Rust. Few languages like Go implement a Contigious Stack approach coordinated over channels for asynchronous operations. Chicken Scheme allocates everything on the stack and moves the recent stack value to heap in case it becomes heavy for its GC algorithm's performance. Moving stacks around has its own set of implications that affect garbage collection negatively. Going through the research done in this space will help you understand the advancements and rationale behind each of the approaches. At the same time, you should also give a thought to how you are planning on designing/implementing the other parts of your language for it be anywhere close to be usable in terms of performance and overall usability.

PS: Given the length of this answer, will be happy to correct any inconsistencies that may have crept in.

Upvotes: 3

Matt Timmermans
Matt Timmermans

Reputation: 59194

Yes, an alternative to converting code into state machines is copying stacks around. This is the way that the go language does it now, and the way that Java will do it when Project Loom is released.

It's not an easy thing to do for real-world languages.

It doesn't work for C and C++, for example, because those languages let you make pointers to things on the stack. Those pointers can be used by other threads, so you can't move the stack away, and even if you could, you would have to copy it back into exactly the same place.

For the same reason, it doesn't work when your program calls out to the OS or native code and gets called back in the same thread, because there's a portion of the stack you don't control. In Java, project Loom's 'virtual threads' will not release the thread as long as there's native code on the stack.

Even in situations where you can move the stack, it requires dedicated support in the runtime environment. The stack can't just be copied into a byte array. It has to be copied off in a representation that allows the garbage collector to recognize all the pointers in it. If C# were to adopt this technique, for example, it would require significant extensions to the common language runtime, whereas implementing state machines can be accomplished entirely within the C# compiler.

Upvotes: 5

Related Questions