Reputation: 503
This question is not about coroutines in C++20 but coroutines in general.
I'm learning C++20 coroutines these days. I've learnt about stackful and stackless coroutines from Coroutines Introduction. I've also SO-ed for more infomation.
Here's my understanding about stackless coroutines:
A stackless coroutine does has stack on the caller's stack when it's running.
When it suspends itself, as stackless coroutines can only suspend at the top-level function, its stack is predictable and useful data are stored in a certain area.
When it's not running, it doesn't have a stack. It's bound with a handle, by which the client can resume the coroutine.
The Coroutines TS specifies that the non-array operator new
is called when allocating storage for coroutine frames. However, I think this is unnecessary, hence my question.
Some explanation/consideration:
Where to put the coroutine's status instead? In the handle, which originally stores the pointer.
Dynamic allocation doesn't mean storing on the heap. But my intent is to elide calls to operator new
, no matter how it is implemented.
From cppreference:
The call to operator new can be optimized out (even if custom allocator is used) if
The lifetime of the coroutine state is strictly nested within the lifetime of the caller, and
the size of coroutine frame is known at the call site
For the first requirement, storing the state directly in the handle is still okay if the coroutine outlives the caller.
For the other, if the caller doesn't know the size, how can it compose the argument to call operator new
? Actually, I can't even imagine in which situation the caller doesn't know the size.
Rust seems to have a different implementation, according to this question.
Upvotes: 5
Views: 3637
Reputation: 4829
Don't confuse the stack of a coroutine with the state of a coroutine.
A stackfull coroutine hosts both, state and stack on a seperate frame allocated somewhere on the heap.
A stackless coroutine hosts its state in a frame on the heap but uses the stack of the resumer thread to push and pop values. If those values are significant for the state of the coroutine, the push and pop operations will directly influence the state fields in the frame, if not it will just use the stack for temporary processing. How does the compiler decide which operations influence the state of the coroutine? It does this in a process called coroutine transformation during compilation.
As you might guess, stackfull coroutines have one big drawback which is that you never really know in advance how much heap to allocate for the full stackframe (which is the case with every thread as well). But instead of bothering with this once per thread, you have to bother with this everytime you create a coroutine.
Upvotes: 0
Reputation: 275740
The fundamental difference between stackful and stackless coroutines is if the coroutine owns a full, theoretically unbounded stack (but practically bounded) like a thread does.
In a stackful coroutine, the local variables of the coroutine are stored on the stack it owns, like anything else, both during execution and when suspended.
In a stackless coroutine, the local variables to the coroutine can be in the stack while the coroutine is running or not. They are stored in a fixed sized buffer that the stackless coroutine owns.
In theory, a stackless coroutine can be stored on someone else's stack. There is, however, no way to guarantee within C++ code that this happens.
Elision of operator new in the creation of a coroutine is sort of about doing that. If your coroutine object is stored on someone's stack, and new was elided because there is enough room in the coroutine object itself for its state, then the stackless coroutine that lives completely on someone else's stack is possible.
There is no way to guarantee this in the current implementation of C++ coroutines. Attempts to get that in where met with resistance by compiler developers, because the exact minimal capture that a coroutine does happens "later" than the time they need to know how big the coroutine is in their compiler.
This leads to the difference in practice. A stackful coroutine acts more like a thread. You can call normal functions, and those normal functions can interact within their bodies with coroutine operations like suspend.
A stackless coroutine cannot call a function with then interacts with the coroutine machinery. Interacting with the coroutine machinery is only permitted within the stackless coroutine itself.
A stackful coroutine has all of the machinery of a thread without being scheduled on the OS. A stackless coroutine is an augmented function object that has goto labels in it that let it be resumed part way through its body.
There are theoretical implementations of stackless coroutines that don't have the "could call new" feature. The C++ standard doesn't require such a type of stackless coroutine.
Some people proposed them. Their proposals lost out to the current one, in part because the current one was far more polished and closer to being shipped than the alternative proposals where. Some of the syntax of the alternative proposals ended up in the successful proposal.
I believe there was a convincing argument that the "stricter" fixed size no-new coroutine implementations where not ruled out by the current proposal, and could be added on afterwards, and that helped kill the alternative proposals.
Upvotes: 5
Reputation: 473966
A stackless coroutine does has stack on the caller's stack when it's running.
That right there is the source of your misunderstanding.
Continuation-based coroutines (which is what a "stackless coroutine" is) is a coroutine mechanism that is designed for being able to provide a coroutine to some other code which will resume its execution after some asynchronous process completes. This resumption may take place in some other thread.
As such, the stack cannot be assumed to be "on the caller's stack", since the caller and the process that schedules the coroutine's resumption are not necessarily in the same thread. The coroutine needs to be able to outlive the caller, so the coroutine's stack cannot be on the caller's stack (in general. In certain co_yield
-style cases, it can be).
The coroutine handle represents the coroutine's stack. So long as that handle exists, so too does the coroutine's stack.
When it's not running, it doesn't have a stack. It's bound with a handle, by which the client can resume the coroutine.
And how does this "handle" store all of the local variables for the coroutine? Obviously they are preserved (it'd be a bad coroutine mechanism if they weren't), so they have to be stored somewhere. The name given for where a function's local variables are is called the "stack".
Calling it a "handle" doesn't change what it is.
But my intent is to elide calls to
operator new
, no matter how it is implemented.
Well... you can't. If never calling new
is a vital component of writing whatever software you're writing, then you can't use co_await
-style coroutine continuations. There is no set of rules you can use that guarantees elision of new
in coroutines. If you're using a specific compiler, you can do some tests to see what it elides and what it doesn't, but that's it.
The rules you cite are merely cases that make it possible to elide the call.
For the other, if the caller doesn't know the size, how can it compose the argument to call
operator new
?
Remember: co_await
coroutines in C++ are effectively an implementation detail of a function. The caller has no idea if any function it calls is or is not a coroutine. All coroutines look like regular functions from the outside.
The code for creating a coroutine stack happens within the function call, not outside of it.
Upvotes: 7
Reputation: 18081
Consider this hypothetical case:
void foo(int);
task coroutine() {
int a[100] {};
int * p = a;
while (true) {
co_await awaitable{};
foo (*p);
}
}
p
points to the first element of a
, if between two resumptions, a
's memory location changed, p
would not hold the right address.
Memory for what would be the function stack must be allocated in such a way that it is conserved between a suspension and its following resumption. But this memory cannot be moved or copied if some objects refers to objects that are within this memory (or at least not without adding complexity). This is a reason why, sometime, compilers need to allocate this memory on the heap.
Upvotes: 3