xmllmx
xmllmx

Reputation: 42379

What's the special value of `co_yield` in contrast to a simple stateful lambda in C++20?

From the well-known C++ coroutine library (search "Don't allow any use of co_await inside the generator coroutine." in the source file generator.hpp), and from my own experiments, I know that a coroutine using co_yield cannot use co_await meanwhile.

Since a generator using co_yield must be synchronous, then, what's the advantage of using co_yield over a simple stateful lambda?

For example:

#include <iostream>

generator<int> g()
{
    for (auto i = 0; i < 9; ++i)
    {
        co_yield i;
    }
}

int main()
{
    auto fn_gen = [i = 0] mutable { return i++; };

    // Lambda way
    for (auto i = 0; i < 9; ++i)
    {
        std::cout << fn_gen() << std::endl;
    }

    // co_yield way
    for (auto i : g())
    {
        std::cout << i << std::endl;
    }
}

What's the special value of co_yield in contrast to a simple stateful lambda in C++20?

Please See the Updated MWE: https://godbolt.org/z/x1Yoen7Ys

In the updated example, the output is totally unexpected when using co_await and co_yield in the same coroutine.

Upvotes: 6

Views: 1432

Answers (2)

glades
glades

Reputation: 4829

A stateful lambda or a custom functor is almost always the better choice imho. In fact you can get more efficient coroutines by just using lambdas. Compare this:

Demo

#include <cstdio>
#include <cstdint>


int main() {

    enum class cont_point : uint8_t {
        init,
        first,
        second,
        third,
        end,
    };

    auto lambda = [cp = cont_point::init]() mutable -> void {
        switch(cp) {
            case cont_point::init:
                printf("init\n");
                cp = cont_point::first;
                break;
            case cont_point::first:
                printf("first\n");
                cp = cont_point::second;
                break;
            case cont_point::second:
                printf("second\n");
                cp = cont_point::third;
                break;
            case cont_point::third:
                printf("third\n");
                cp = cont_point::end;
                break;
            default:
                return ;
        }
    };
    
    lambda();
    lambda();
    lambda();
    lambda();
}

Yields:

init
first
second
third

If you check the assembly you will see that the code is optimized to perfection which gives you a hint about how efficient compilers are in optimizing lambdas. The same is not true for coroutines (not yet at least).

But

Coroutines offer one very interesting niche case which no other language construct can fill, namely they solve the cactus stack problem. The cactus stack problem basically denotes the problem of code forks to run on the same stack - this is not possible so a seperate stack must be generated. If the executing thread on that stack then forks again, there must be another stack and so on. And what's even worse is that nobody knows how big these stacks are going to be.

C++20 coroutines are stackless which conversely means they do use a stack but not for the stateful data, only data that does not traverse the awaitable points will be thrown on the executing task's stack, so it can safely be deleted during stack unwinding while all stateful data remains on something called a coroutine frame, that typically (and unfortunately even in simple-to-optimise cases) rests on the heap (allocated via operator new). This decision of what to put inside the coroutine frame and what to put on the callstack as execution goes on is done by the compiler in a process called coroutine transformation. It is this process that makes coroutines uniquely able to solve the cactus stack problem as follows:

Every newly allocated coroutine instance will keep a predefined amount of space on the heap, comparable to an object with its data fields. When the coroutine is executed additional data is put on the stack of whatever task is executing the continuation of the coroutine. This way, the stack can grow dynamically while and we don't have the problem of many stack overflows (like is the case for stackful coroutines) but we only have to make sure all threads have sufficient stackspace available to them as we usually do.

Upvotes: 0

Nicol Bolas
Nicol Bolas

Reputation: 474046

For trivial generators with minimal internal state and code, a small functor or lambda is fine. But as your generator code becomes more complex and requires more state, it becomes less fine. You have to stick more members in your functor type or your lambda specifier. You have bigger and bigger code inside of the function. Etc.

At the most extreme, a co_yield-based generator can hide all of its implementation details from the outside world, simply by putting its definition in a .cpp file. A stateful functor cannot hide its internal state, as its state are members of the type, which the outside world must see. The only way to avoid that is through type-erasure, such as with something like std::function. At which point, you've gained basically nothing over just using co_yield.

Also, co_await can be used with co_yield. Cppcoro's generator type explicitly hoses it, but cppcoro isn't C++20. You can write whatever generator you want, and that generator can support uses of co_await for specific purposes.

Indeed, you can make asynchronous generators, where sometimes you can yield a value immediately, and sometimes you can schedule the availability of a value with some asynchronous process. The code invoking your async generator can co_await on it to extract values from it, rather than treating it like a functor or an iterator pair.

Upvotes: 7

Related Questions