Reputation: 18218
EDIT: Okay, I've got a specific question. I want to implement 'exchange' functionality with acquire and release semantic (pseudo-code):
interlocked_inc_32(target)
{
mov ecx, 1
lea eax, target
lock xadd, [eax], ecx
}
interlocked_inc_32_acq(target)
{
lfence
mov ecx, 1
lea eax, target
lock xadd, [eax], ecx
}
interlocked_inc_32_rel(target)
{
sfence
mov ecx, 1
lea eax, target
lock xadd, [eax], ecx
}
The problem with that is: I have no idea how to implement this. I'm developing under windows using Microsofts Visual Studio 2010. Sure, there are "intrin.h" and "Windows.h" which provide exactly these functions / intrinsics. BUT there InterlockedIncrementAcquire is just a define for InterlockedIncrement and provides a full memory barrier. That's not wht I'm after.
/**************************************** original post: /****************************************
I want to write a atomic class like C++0x std::atomic. I just want to be sure if my thoughts about it are right.
I would like to implement the following code: EDIT (replaced bad implementation)
enum memory_order { memory_order_acquire, memory_order_release, memory_order_acq_rel };
template<class T> class atomic;
template<class atomic_type, std::size_t = sizeof(typename ExtractType<atomic_type>::type)> struct interlocked;
template<template<class> class atomic_type> struct interlocked<atomic_type, 1>
{
typedef typename ExtractType<atomic_type>::type bit8_type;
void store(bit8_type value, memory_order order = memory_order_acq_rel) volatile {
interlocked_xchg_8<order>(&static_cast<atomic_type volatile*>(this)->m_value, value);
}
bit8_type load(memory_order order = memory_order_acq_rel) const volatile
{
interlocked_cmp_xchg_8<order>(
const_cast<bit8_type volatile*>(&static_cast<volatile const atomic_type *const>(this)->m_value),
static_cast<atomic_type const volatile*>(this)->m_value,
static_cast<atomic_type const volatile*>(this)->m_value
);
}
bit8_type exhange(bit8_type, memory_order order = memory_order_acq_rel) volatile {
return interlocked_xchg_8<order>(&static_cast<atomic_type volatile*>(this)->m_value, value);
}
bool compare_exchange(bit8_type comperand, bit8_type new_value, memory_order order = memory_order_acq_rel) volatile
{
return interlocked_cmp_xchg_8<order>(
&static_cast<atomic_type volatile*>(this)->m_value,
new_value,
comperand
) == comperand;
}
};
template<template<class> class atomic_type> struct interlocked<atomic_type, 2> { };
template<template<class> class atomic_type> struct interlocked<atomic_type, 4> { };
template<template<class> class atomic_type> struct interlocked<atomic_type, 8> { };
template<class T>
class atomic : public interlocked<atomic<T>> { T m_value; };
Is there anything what I'm missing or is this a "good" quite good implementation.
Thanks for any comment. Best regards:
PS: I don't want to start a new question for this: What's the advantage of using boost::uint32_t (in boost\cstdint.h) instead of uint32_t (in stdint.h)?
Upvotes: 3
Views: 1108
Reputation: 21
The problem that you are facing here is that the lock prefix implies a full memory barrier (mfence). That is, because on the previous x86 processors there was not different kind of memory barriers and separate sfence/lfence instructions.
Upvotes: 2
Reputation: 92341
Are you targeting x86 hardware? Doesn't its cache synchronization scheme imply that full memory barriers is what you get? How are you trying to improve on that?
Upvotes: 3