Don Hosek
Don Hosek

Reputation: 1035

Is this a reasonable use case for coroutines?

I'm trying to get my head around the use cases for coroutines and I'm wondering whether this would be a reasonable use case for C++20 coroutines.

I'm writing a library that handles text substitution in a stream of UTF-8 characters. I'm thinking that I would have methods on the class of:

std::u8string parse(std::u8string input_string);
std::u8string flush();

It would be possible that a substitution might be in an unfinished state at the end of a call to \parse so, e.g., if there is a substitution of, say, --- to — then a sequence of calls

auto a = charsub.parse(u8"and --");
auto b = charsub.parse(u8"- ");
auto c = charsub.parse(u8"--");
auto d = charsub.flush();

would initialize the values of a, b, c and d to "and ", "— ", "" and "--" respectively.

Do I gain anything from implementing this API via coroutines? And if so, what would the code look like for this?

Upvotes: 0

Views: 1073

Answers (1)

Zartaj Majeed
Zartaj Majeed

Reputation: 510

Your intuition is right about solving this with coroutines. Text transformation and parsing problems are classic applications of coroutines. Indeed the example used for the very first coroutine by Conway is very similar to your problem.

Coroutines come to mind any time there's a function that needs to maintain state across calls. Before C++ coroutines we could use functors or capturing lambdas to solve such problems. What coroutines bring to the table is the ability to not just maintain the state of local data but also the state of local logic.

So coroutine code can be simpler and have a nicer flow than a normal function that examines the state at its entry with every call.

Beyond keeping state locally, coroutines let a function offer customization points or hooks similar to a lambda passed in as a callback.

Using lambdas as callbacks is an example of inversion of control. A coroutine-based design however is the opposite of inversion of control - a "reversion of control", if you will, back to client code that decides when to co_await and what to do with the result.

For example, instead of yielding a transformed string the coroutine below could yield a single character or 3 dashes in a row - then application code could replace the 3 dashes with an em dash or something else. This yielded "event" could be occurrences of other character sequences or patterns. The coroutine's job would be narrowed to scanning the input string - replacement and transformation would be the responsibility of another coroutine

My coroutine solution to your problem is relatively simple and probably does not look that different from a noncoroutine function. The main difference is it maintains the state of dashes in the input that otherwise would need to be kept externally.

It uses minimal and straightforward C++ coroutine machinery - except for one thing. Since your parse function takes new input with every call, I somehow need to update the input string local to the coroutine. This is not straightforward and can be done in different ways.

I chose to make the coroutine co_await on a reference to the local string variable. The reference/address of the local variable is stored in the coroutine promise. This is achieved by intercepting co_await with an await_transform method of the coroutine promise.

Once the promise has the address of the local variable, it can be updated through the public return object.

The local string variable in the coroutine is a pointer to avoid unnecessary string copies.

This technique is a bit hacky to access local variables in the coroutine - it would be better though require more code for the coroutine to instead co_await another coroutine that would return the new input string.

I also avoided u8string since it's a pain to use

The code was tested with gcc 11.2 and vc++ 2022 version 17.1

g++ -std=gnu++23 -ggdb3 -O0 -Wall -Werror -Wextra -fcoroutines -o parsedashes parsedashes.cpp
$ parsedashes < <(echo -e "and --\n- \n--")

and
—

--

The complete program

// parsedashes.cpp

#include <stdio.h>
#include <iostream>
#include <coroutine>
#include <string>

using namespace std;

static void usage()
{
  cout << "usage: parsedashes <in.txt" << "\n";
}

The coroutine return object and its public API

struct ReplaceDashes {
  struct Promise;
  using promise_type = Promise;
  coroutine_handle<Promise> coro;

  ReplaceDashes(coroutine_handle<Promise> h): coro(h) {}

  ~ReplaceDashes() {
    if(coro)
      coro.destroy();
  }

// resume the suspended coroutine
  bool next() {
    coro.resume();
    return !coro.done();
  }

// return the value yielded by coroutine
  string value() const {
    return coro.promise().output;
  }

// set the input string and run coroutine
  ReplaceDashes& operator()(string* input) {
    *coro.promise().input = input;
    coro.resume();
    return *this;
  }

Its internal promise object

  struct Promise {
// address of a pointer to the input string
    string** input;
// the transformed output aka yielded value of the coroutine
    string output;

    ReplaceDashes get_return_object() {
      return ReplaceDashes{coroutine_handle<Promise>::from_promise(*this)};
    }

// run coroutine immediately to first co_await
    suspend_never initial_suspend() noexcept {
      return {};
    }

// set yielded value to return
    suspend_always yield_value(string value) {
      output = value;
      return {};
    }
// set returned value to return
    void return_value(string value) {
      output = value;
    }

    suspend_always final_suspend() noexcept {
      return {};
    }

    void unhandled_exception() noexcept {}
// intercept co_await on the address of the local variable in
// the coroutine that points to the input string
    suspend_always await_transform(string** localInput) {
      input = localInput;
      return {};
    }

  };

};

The actual coroutine function

ReplaceDashes replaceDashes()
{
  string dashes;
  string outstr;

// input is a pointer to a string instead of a string
// this way input string can be changed cheaply
  string* input{};

// pass a reference to local input string to keep in coroutine promise
// this way input string can be set from outside coroutine
  co_await &input;

  for(unsigned i = 0;;) {
    char chr = (*input)[i++];
// string is consumed, return the transformed string
// or any leftover dashes if this was the final input
    if(chr == '\0') {
      if(i == 1) {
        co_return dashes;
      }
      co_yield outstr;
// resume to process new input string
      i = 0;
      outstr.clear();
      continue;
    }
// append non-dash after any accumulated dashes
    if(chr != '-') {
      outstr += dashes;
      outstr += chr;
      dashes.clear();
      continue;
    }
// accumulate dashes
    if(dashes.length() < 2) {
      dashes += chr;
      continue;
    }
// replace 3 dashes in a row
// unicode em dash u+2014 '—' is utf8 e2 80 94
    outstr += "\xe2\x80\x94";
    dashes.clear();
  }

}

The parser API

struct Charsub {

  ReplaceDashes replacer = replaceDashes();

  string parse(string& input) {
    return replacer(&input).value();
  }

  string flush() {
    replacer.next();
    return replacer.value();
  }

};

The driver program

int main(int argc, char* argv[])
{
  (void)argv;

  if(argc > 1) {
    usage();
    return 1;
  }

  Charsub charsub;

  for(string line; getline(cin, line);) {
    cout << charsub.parse(line) << "\n";
  }
  cout << charsub.flush();

}

Upvotes: 2

Related Questions