Reputation: 210745
Let's say I have a computationally intensive algorithm running.
For example, let's say it's a routing algorithm, and on window running on a separate thread, I want to show the user what routes are being currently being analyzed and such, and for whatever reason, it contains heavily CPU-intensive code.
The important thing is that I don't want to slow down the worker thread just for the sake of displaying progress; it needs to run at full-speed as much as possible. It is perfectly OK if the user sees stale data, such as an in-between that didn't actually occur (say, two active routes at once), because this progress visualization is for informational purposes only, and nothing else.
From a theoretical standpoint, I think that according to the C++ standard, my best option is to use std::atomic
with std::memory_order_relaxed
on both threads. But that would slow down the code on the worker thread noticeably.
From a practical standpoint, though, I'm just tempted to ignore std::atomic
altogether, and just have the worker thread work with all the variables normally. Who cares if the GUI thread reads stale data? i don't, and presumably neither will the user. In reality it won't matter because there is only one worker thread, and only that thread needs to observe valid writes, which in practice is the only thing that'll happen.
What I'm wondering about is:
What is the best way to solve this kind of problem, both in theory and in practice?
Do people just ignore the standard and go for raw primitives, or do they bite the bullet and take the performance hit of using std::atomic
?
Or are there other facilities I'm not aware of for soving this problem?
Upvotes: 1
Views: 189
Reputation: 6577
Ignoring proper fences for std::atomic wouldn't buy you match but you might be at risk of loosing the communication between threads completely, mostly on the compiler side. The problem does not exist for example on x86 hardware side at all, because each store to memory (if you can ensure your compiler do it as expected) has required store-with-release semantics anyway.
Also I doubt that sharing the progress more often than 30-100 FPS (or Hz) brings any value. On the other hand, it can certainly put the unnecessary burden on the system resources (if repeated in a tight loop) and break compiler optimizations, e.g. vectorization.
So, if the overhead for worker thread is the concern, share the info with less frequency. E.g. update the atomic counter once in 1024 iterations:
// worker thread
if( i%1024 == 0 ) // update the progress info
my_atomic_progress.store( i, std::memory_order_release ); // regular `mov` on x86
// GUI thread
auto i = my_atomic_progress.load( std::memory_order_consume );
This example also shows the minimal fences necessary to establish the communication, otherwise the compiler is free to optimize the memory operations out of a loop for example.
Upvotes: 4
Reputation: 52689
There is no best way - it depends how much data you need to send to the display, if its just a single long integer value, and the display is completely nu-guaranteed, then I'd just write the value and have done with it. Occasionally the reader will read a corrupted value, but it won't matter so I won't care.
Otherwise, I'd be tempted to send the value to a queue and use an event or condition variable to trigger the read afterwards (as often you do not want the reader running full tilt, and you need some way to inform it there is new data to read)
I'm not sure the overhead for std::atomic is that great - isn't it going to be implemented in the OS primitives anyway? If so, the primitives (on Windows, x86 at least via InterlockedExchange function) end up as a single CPU instruction after the compiler and optimiser have done their thng.
Upvotes: 0