Reputation: 2715
I encountered the following implementation of Singleton's get_instance
function:
template<typename T>
T* Singleton<T>::get_instance()
{
static std::unique_ptr<T> destroyer;
T* temp = s_instance.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire);
if (temp == nullptr)
{
std::lock_guard<std::mutex> lock(s_mutex);
temp = s_instance.load(std::memory_order_relaxed);/* read current status of s_instance */
if (temp == nullptr)
{
temp = new T;
destroyer.reset(temp);
std::atomic_thread_fence(std::memory_order_release);
s_instance.store(temp, std::memory_order_relaxed);
}
}
return temp;
}
And I was wondering - is there any value in the acquire and release memory barriers there? As far as I know - memory barriers are aimed to prevent reordering of memory operations between 2 different variables. Let's take the classic example:
(This is all in pseudo-code - don't be caught on syntax)
# Thread 1
while(f == 0);
print(x)
# Thread 2
x = 42;
f = 1;
In this case, we want to prevent the reordering of the 2 store operations in Thread 2, and the reordering of the 2 load operations in Thread 1. So we insert barriers:
# Thread 1
while(f == 0)
acquire_fence
print(x)
# Thread 2
x = 42;
release_fence
f = 1;
But in the above code, what is the benefit of the fences?
The main difference between those cases as I see it is that, in the classic example, we use memory barriers since we deal with 2 variables - so we have the "danger" of Thread 2 storing f
before storing x
, or alternatively having the danger in Thread 1 of loading x
before loading f
.
But in my Singleton code, what is the possible memory reordering that the memory barriers aim to prevent?
I know there are other ways (and maybe better) to achieve this, my question is for educational purposes - I'm learning about memory barriers and curious to know if in this particular case they are useful. So all other things here not relevant for this manner please ignore.
Upvotes: 4
Views: 465
Reputation: 6707
The complexity of this pattern (named double-checked-locking, or DCLP) is that data synchronization can happen in 2 different ways (depending on when a reader accesses the singleton) and they kind of overlap.
But since you're asking about fences, let's skip the mutex part.
But in my Singleton code, what is the possible memory reordering that the memory barriers aim to prevent?
This is not very different from your pseudo code where you already noticed that the acquire and release fences are necessary to guarantee the outcome of 42.
f
is used as the signalling variable and x
better not be reordered with it.
In the DCL pattern, the first thread gets to allocate memory: temp = new T;
The memory temp
is pointing at, is going to be accessed by other threads, so it must be synchronized (ie. visible to other threads).
The release fence followed by the relaxed store guarantees that the new
operation is ordered before the store such that other threads will observe the same order.
Thus, once the pointer is written to the atomic s_instance
and other threads read the address from s_instance
, they will also have visibility to the memory it is pointing at.
The acquire fence does the same thing, but in opposite order; it guarantees that everything that is sequenced after the relaxed load and fence (ie. accessing the memory) will not be visible to the thread that allocated this memory. This way, allocating memory in one thread and using it in another will not overlap.
In another answer, I tried to visualize this with a diagram.
Note that these fences always come in pairs, a release fence without acquire is meaningless, although you could also use (and mix) fences with release/acquire operations.
s_instance.store(temp, std::memory_order_release); // no standalone fence necessary
The cost of DCLP is that every use (in every thread) involves a load/acquire, which at a minimum requires an unoptimized load (ie. load from L1 cache). This is why static objects in C++11 (possibly implemented with DCLP) might be slower than in C++98 (no memory model).
For more information about DCLP, check this article from Jeff Preshing.
Upvotes: 3