MetallicPriest
MetallicPriest

Reputation: 30825

Efficient Memory Barriers

I have a multithreaded application, where each thread has a variable of integer type. These variables are incremented during execution of the program. At certain points in the code, a thread compares its counting variable with those of the other threads.

Now since, we know that threads running on multicore might execute out of order, a thread might not read the expected counter values of the other threads. To solve this problem, one way is to use atomic variable, such as std::atomic<> of C++11. However, performing a memory fence at each increment of counters will significantly slow down the program.

Now what I want to do is that when a thread is about to read other thread's counter, only then a memory fence is created and counters of all the threads are updated in the memory at that point. How can this be done in C++. I am using Linux and g++.

Upvotes: 4

Views: 2121

Answers (6)

janneb
janneb

Reputation: 37238

You could try something like the signal-theft limit counter design in Secion 4.4.3 of http://mirror.nexcess.net/kernel.org/linux/kernel/people/paulmck/perfbook/perfbook.2011.08.28a.pdf

This kind of design can eliminate the atomic operations from the fastpath (incrementing the per-thread counter). Whether the complexity is worth it is up to you to decide, of course.

Upvotes: 0

Offirmo
Offirmo

Reputation: 19870

And why not having a "control" thread, to whom each thread reports its counter increments and ask for the values of others ?

It would make it very efficient and simple. Just a suggestion.

Upvotes: 0

R. Martinho Fernandes
R. Martinho Fernandes

Reputation: 234654

The C++11 standard library includes support for fences in <atomic> with std::atomic_thread_fence.

Calling this invokes a full fence:

std::atomic_thread_fence(std::memory_order_seq_cst);

If you want to emit only an acquire or only a release fence, you can use std:memory_order_acquire and std::memory_order_release instead.

Upvotes: 5

stefaanv
stefaanv

Reputation: 14392

My suggestion would be to have a collectTimers() function in a higher level class that can ask each thread for its counter (via queue/msg). This way updating timers are not delayed, but collecting timers is a bit slower.

This only works if you have some kind of communication mechanism between the threads.

Upvotes: 0

Puppy
Puppy

Reputation: 147036

There are x86 intrinsics that correspond to memory barriers that you can use yourself. The Windows header has a memory barrier macro, so you should be able to find something equivalent for Linux.

Upvotes: 1

EddieBytes
EddieBytes

Reputation: 1343

You can use boost::asio::strand for this exact purpose. Create a handler responsible for reading the counter. That handler can be called from multiple threads. Instead of directly calling the handler, wrap it inside a boost::asio::strand. This will ensure the handler can not be concurrently called by multiple threads.

http://www.boost.org/doc/libs/1_35_0/doc/html/boost_asio/tutorial/tuttimer5.html

I hope I understood the question right.

Upvotes: 0

Related Questions