Reputation: 116
I am trying to build a generic task system where I can post tasks that get executed on whatever thread is free. With previous attempt I often ran out of threads because they would block at some point. So I am trying boost fibers; when one fiber blocks the thread is free to work on some other fiber, sounds perfect.
The work-stealing algorithm seems to be ideal for my purpose, but I have a very hard time to use it. In the example code fibers get created and only then the threads and schedulers get created, so all the fibers actually get executed on all the threads. But I want to start fibers later and by then all the other threads are suspended indefinitely because they didn't have any work. I have not found any way to wake them up again, all my fibers get only executed on the main thread. "notify" seems to be the method to call, but I don't see any way to actually get to an instance of an algorithm.
I tried keeping pointers to all instances of the algorithm so I could call notify(), but that doesn't really help; most of the time the algorithms in the worker threads cannot steal anything from the main one because the next one is the dispatcher_context.
I could disable "suspend", but threads are busy-waiting then, not an option.
I also tried the shared_work-algorithm. Same problem, once a thread cannot find a fiber it will never wake up again. I tried the same hack manually calling notify(), same result, very unreliable.
I tried using the channels, but AFAICT, if a fiber is waiting for it, the current context just "hops" over and runs the waiting fiber, suspending the current one.
In short: I find it very hard to reliably run a fiber on another thread. When profiling most threads are just waiting on a condition_variable, even though I did create tons of fibers.
As a small testing case I am trying:
std::vector<boost::fibers::future<int>> v;
for (auto i = 0; i < 16; ++i)
v.emplace_back(boost::fibers::async([i] {
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
return i;
}));
int s = 0;
for (auto &f : v)
s += f.get();
I am intentionally using this_thread::sleep_for to simulate the CPU being busy.
With 16 threads I would expect this code to run in 1s, but mostly it ends up being 16s. I was able to get this specific example to actually run in 1s just hacking around stuff; but no way felt "right" and no way did work for other scenarios, it always had to be hand-crafted to one specific scenario.
I think this example should just work as expected with a work_stealing algorithm; what am I missing? Is it just a misuse of fibers? How could I implement this reliably?
Thanks, Dix
Upvotes: 2
Views: 1909
Reputation: 2109
boost.fiber contains an example using the work_stealing algorithm (examples/work_stealing.cpp).
You have to install the algorithm on each worker-thread that should handle/steal fibers.
boost::fibers::use_scheduling_algorithm< boost::fibers::algo::work_stealing >( 4); // 4 worker-threads
Before you process tasks/fibers, you have to wait till all worker-threads have been registered at the algotithm. The example uses a barrier for this purpose.
You need an idication that all work/task has been procesed, for isntance using a condition-variable.
Take a look at Running with worker threads (boost documentation).
Upvotes: 1