Reputation: 1035
I'm trying to get my head around the use cases for coroutines and I'm wondering whether this would be a reasonable use case for C++20 coroutines.
I'm writing a library that handles text substitution in a stream of UTF-8 characters. I'm thinking that I would have methods on the class of:
std::u8string parse(std::u8string input_string);
std::u8string flush();
It would be possible that a substitution might be in an unfinished state at the end of a call to \parse
so, e.g., if there is a substitution of, say, ---
to — then a sequence of calls
auto a = charsub.parse(u8"and --");
auto b = charsub.parse(u8"- ");
auto c = charsub.parse(u8"--");
auto d = charsub.flush();
would initialize the values of a
, b
, c
and d
to "and ", "— ", "" and "--" respectively.
Do I gain anything from implementing this API via coroutines? And if so, what would the code look like for this?
Upvotes: 0
Views: 1073
Reputation: 510
Your intuition is right about solving this with coroutines. Text transformation and parsing problems are classic applications of coroutines. Indeed the example used for the very first coroutine by Conway is very similar to your problem.
Coroutines come to mind any time there's a function that needs to maintain state across calls. Before C++ coroutines we could use functors or capturing lambdas to solve such problems. What coroutines bring to the table is the ability to not just maintain the state of local data but also the state of local logic.
So coroutine code can be simpler and have a nicer flow than a normal function that examines the state at its entry with every call.
Beyond keeping state locally, coroutines let a function offer customization points or hooks similar to a lambda passed in as a callback.
Using lambdas as callbacks is an example of inversion of control. A coroutine-based design however is the opposite of inversion of control - a "reversion of control", if you will, back to client code that decides when to co_await
and what to do with the result.
For example, instead of yielding a transformed string the coroutine below could yield a single character or 3 dashes in a row - then application code could replace the 3 dashes with an em dash or something else. This yielded "event" could be occurrences of other character sequences or patterns. The coroutine's job would be narrowed to scanning the input string - replacement and transformation would be the responsibility of another coroutine
My coroutine solution to your problem is relatively simple and probably does not look that different from a noncoroutine function. The main difference is it maintains the state of dashes in the input that otherwise would need to be kept externally.
It uses minimal and straightforward C++ coroutine machinery - except for one thing. Since your parse function takes new input with every call, I somehow need to update the input string local to the coroutine. This is not straightforward and can be done in different ways.
I chose to make the coroutine co_await
on a reference to the local string variable. The reference/address of the local variable is stored in the coroutine promise. This is achieved by intercepting co_await
with an await_transform
method of the coroutine promise.
Once the promise has the address of the local variable, it can be updated through the public return object.
The local string variable in the coroutine is a pointer to avoid unnecessary string copies.
This technique is a bit hacky to access local variables in the coroutine - it would be better though require more code for the coroutine to instead co_await
another coroutine that would return the new input string.
I also avoided u8string
since it's a pain to use
The code was tested with gcc 11.2 and vc++ 2022 version 17.1
g++ -std=gnu++23 -ggdb3 -O0 -Wall -Werror -Wextra -fcoroutines -o parsedashes parsedashes.cpp
$ parsedashes < <(echo -e "and --\n- \n--")
and
—
--
The complete program
// parsedashes.cpp
#include <stdio.h>
#include <iostream>
#include <coroutine>
#include <string>
using namespace std;
static void usage()
{
cout << "usage: parsedashes <in.txt" << "\n";
}
The coroutine return object and its public API
struct ReplaceDashes {
struct Promise;
using promise_type = Promise;
coroutine_handle<Promise> coro;
ReplaceDashes(coroutine_handle<Promise> h): coro(h) {}
~ReplaceDashes() {
if(coro)
coro.destroy();
}
// resume the suspended coroutine
bool next() {
coro.resume();
return !coro.done();
}
// return the value yielded by coroutine
string value() const {
return coro.promise().output;
}
// set the input string and run coroutine
ReplaceDashes& operator()(string* input) {
*coro.promise().input = input;
coro.resume();
return *this;
}
Its internal promise object
struct Promise {
// address of a pointer to the input string
string** input;
// the transformed output aka yielded value of the coroutine
string output;
ReplaceDashes get_return_object() {
return ReplaceDashes{coroutine_handle<Promise>::from_promise(*this)};
}
// run coroutine immediately to first co_await
suspend_never initial_suspend() noexcept {
return {};
}
// set yielded value to return
suspend_always yield_value(string value) {
output = value;
return {};
}
// set returned value to return
void return_value(string value) {
output = value;
}
suspend_always final_suspend() noexcept {
return {};
}
void unhandled_exception() noexcept {}
// intercept co_await on the address of the local variable in
// the coroutine that points to the input string
suspend_always await_transform(string** localInput) {
input = localInput;
return {};
}
};
};
The actual coroutine function
ReplaceDashes replaceDashes()
{
string dashes;
string outstr;
// input is a pointer to a string instead of a string
// this way input string can be changed cheaply
string* input{};
// pass a reference to local input string to keep in coroutine promise
// this way input string can be set from outside coroutine
co_await &input;
for(unsigned i = 0;;) {
char chr = (*input)[i++];
// string is consumed, return the transformed string
// or any leftover dashes if this was the final input
if(chr == '\0') {
if(i == 1) {
co_return dashes;
}
co_yield outstr;
// resume to process new input string
i = 0;
outstr.clear();
continue;
}
// append non-dash after any accumulated dashes
if(chr != '-') {
outstr += dashes;
outstr += chr;
dashes.clear();
continue;
}
// accumulate dashes
if(dashes.length() < 2) {
dashes += chr;
continue;
}
// replace 3 dashes in a row
// unicode em dash u+2014 '—' is utf8 e2 80 94
outstr += "\xe2\x80\x94";
dashes.clear();
}
}
The parser API
struct Charsub {
ReplaceDashes replacer = replaceDashes();
string parse(string& input) {
return replacer(&input).value();
}
string flush() {
replacer.next();
return replacer.value();
}
};
The driver program
int main(int argc, char* argv[])
{
(void)argv;
if(argc > 1) {
usage();
return 1;
}
Charsub charsub;
for(string line; getline(cin, line);) {
cout << charsub.parse(line) << "\n";
}
cout << charsub.flush();
}
Upvotes: 2