Reputation: 21974
C++ uses global operator new to allocate coroutines by default. But this potentially leaves a lot of performance on the floor compared to Rust, which can stack allocate them. This is disappointing because for most coroutine use this would be fine -- often you co_await
a coroutine right when creating it, or co_await
a join/combinator of it and several others immediately. You can override the operator new
and operator delete
for a promise type and create a custom allocator that does does strict FIFO allocation over some preallocated heap area, but it would still generally be better to reuse the thread's already existing, already hot in cache stack.
AFAICT it is impossible to use alloca
on the fly for this -- any call to it in operator new
would be freed when the operator function returns. You could preallocate a big chunk of space with alloca
in some top level function and then define operator new
for the promise type to allocate out of that region, but this is effectively the same as having the separate heap allocated area from a cache-hotness perspective (all of your coroutines using a separate special otherwise cold area instead of being intermingled with your regular calls using the real top of the stack).
Is there any way to make alloca
work?
There is a related question here about whether you can call alloca
inside a coroutine, but I am asking about using it to back the allocation of the coroutine itself (which necessarily happens outside it) before running it.
There is also this question that open endedly asks if stackless C++ coroutines are a problem, where some answers try to justify the design but doesn't mention alloca
at all and doesn't address that Rust model is an existence proof for it being possible in principle to use stack allocations.
Upvotes: 3
Views: 186
Reputation: 67802
Coroutine state is allocated with the promise class's operator new
if one is defined, and with the global new
if not.
Obviously calling alloca
inside any operator new
implementation would be unhelpful, and there's no other explicit view on the allocation which underlies returning a coroutine handle.
However, you may be interested to read on cppreference that
The call to
operator new
can be optimized out (even if custom allocator is used) if
- The lifetime of the coroutine state is strictly nested within the lifetime of the caller, and
- the size of coroutine frame is known at the call site.
In that case, coroutine state is embedded in the caller's stack frame (if the caller is an ordinary function) or coroutine state (if the caller is a coroutine).
which suggests that a good implementation doesn't need any explicit code or language support to permit the optimization you want.
Upvotes: 1