Strandify inter coorporating objects for multithread support

Question

My current application owns multiple «activatable» objects*. My intent is to "run" all those object in the same io_context and to add the necessary protection in order to toggle from single to multiple threads (to make it scalable)

If these objects were completely independent from each others, the number of threads running the associated io_context could grow smoothly. But since those objects need to cooperate, the application crashes in multithread despite the strand in each object.

Let's say we have objects of type A and type B, all of them served by the same io_context. Each of those types run asynchronous operations (timers and sockets - their handlers are surrounded with bind_executor(strand, handler)), and can build a cache based on information received via sockets and posted operations to them. Objects of type A needs to get information cached from multiple instances of B in order to perform their own work.

Would it be possible to access this information by using strands (without adding explicit mutex protection) and if yes how ?

If not, what strategy could be adopted to achieve the scalability?

I already tried playing with futures but that strategy leads unsurprisingly to deadlocks.

Thanx

(*) Maybe I'm wrong in the terminology: objects get a reference to an io_context and own their own strand, so I think they are activatable, because they don't own really a running thread

sehe · Accepted Answer

You're mixing vague words a bit. "Activatable", "Strandify", "inter coorporating". They're all close to meaningful concepts, yet, narrowly avoid binding to any precise meaning.

Deconstructing

Let's simplify using more precise concepts.

Let's say we have objects of type A and type B, all of them served by the same io_context

I think it's more fruitful to say "types A and B have associated executors". When you make sure all operations on A and B operate from that executor and you make sure that executor serializes access, then you basically get the Active Object pattern.

[can build a cache based on information received via sockets] and posted operations to them

That's interesting. I take that to mean you don't directly call members of the class, unless they defer the actual execution to the strand. This, again, would be the Active Object.

However, your symptoms suggest that not all operations are "posted to them". Which implies they run on arbitrary threads, leading to your problem.

Would it be possible to access this information by using strands (without adding explicit mutex protection) and if yes how ?

The key to your problems is here. Data dependencies. It's also, ;ole;y going to limit the usefulness of scaling, unless of course the generation of information to retrieve from other threads is a computationally expensive operation.

However, in the light of the phrase _"to get information cached from multiple instances of B'" suggests that in fact, the data is instantaneous, and you'll just be paying synchronization costs for accessing across threads.

Questions

Q. Would it be possible to access this information by using strands (without adding explicit mutex protection) and if yes how ?

Technically, yes. By making sure all operations go on the strand, and the objects become true active objects.

However, there's an important caveat: strands aren't zero-cost. Only in certain contexts they can be optimized (e.g. in immediate continuations or when the execution context has no concurrency).

But in all other contexts, they end up synchronizing at similar cost as mutexes. The purpose of a strand is not to remove the lock contention. Instead it rather allows one to declaratively specify the synchronization requirements for tasks, so that so that the same code can be correctly synchronized regardless of the methods of async completion (using callbacks, futures, coroutines, awaitables, etc) or the chosen execution context(s).

Example: I recently uncovered a vivid illustration of the cost of strand synchronization even in a simple context (where serial execution was already implicitly guaranteed) here:

sehe mar 15, 23:08 Oh cool. The strands were unnecessary. I add them for safety until I know it's safe to go without. In this case the async call chains form logical strands (there are no timers or full duplex sockets going on, so it's all linear). That... improves the situation :) Now it's 3.5gbps even with the 1024 byte server buffer

The throughput increased ~7x from just removing the strand.

Q. If not, what strategy could be adopted to achieve the scalability?

I suspect you really want caches that contain shared_futures. So that the first retrieval puts the future for the result in cache, where subsequent retrievals get the already existing shared future immediately.

If you make sure your cache lookup datastructure is threadsafe, likely with a reader/writer lock (shared_mutex), you will be free to access it with minimal overhead from any actor, instead of requiring to go through individual strands of each producer.

Keep in mind that awaiting futures is a blocking operation. So, if you do that from tasks posted on the execution context, you may easily run out of threads. In such cases it maybe better to provide async_get in terms of boost::asio::async_result or boost::asio::async_completion so you can wait in non-blocking fashion.

Strandify inter coorporating objects for multithread support

Answers (1)

Deconstructing

Questions

Related Questions