Reputation: 9484
I'm working on a lock-free version of the "Simple Segregated Storage" memory pool in C++.
The SSS memory pool is similar to a slab allocator : it's basically just a chunk of memory that is divided into blocks of equal sizes, and we have a free list pointer pointing to the first available block. Allocating is simply moving the pointer up to the next block, and deallocating is simply setting the free list pointer to the deallocated block, and pointing the "next" pointer on the deallocated block to the old value of the freelist pointer.
So it's basically a singly-linked list.
Now, I'm trying to code a lock-free version of the Simple Segregated Storage algorithm. If we assume that SEGREGATING the initial block of memory (i.e. creating the linked list) is always done before entering a multi-threaded environment, we only need to worry about allocating and deallocating blocks - in which case this problem becomes very similar to a lock-free singly linked-list, which is a well understood problem.
So, it seems to me that allocating and deallocating could easily be done in a lock-free manner by using simple compare and swap instructions.
Assuming we have the following free list pointer:
std::atomic<unsigned char*> m_first
And assume we have a helper function, nextof()
, which takes in a block pointer and returns a reference to the "next pointer". The next pointer is actually embedded within the memory block - so the nextof
function looks like this:
unsigned char*& nextof(void* ptr)
{
return *static_cast<unsigned char**>(ptr);
}
That's at least how the Boost library implements Simple Segregated Storage.
Anyway, my attempt at a lock-free, atomic allocate
function is:
void* malloc()
{
unsigned char* first = m_first.load();
if (!first) return nullptr;
while (!m_first.compare_exchange_strong(first, nextof(first)))
{
if (!first) break;
}
return first;
}
So - this seems very straightforward. All we need to do is atomically set m_first
to the next pointer embedded within the block pointed to by m_first
. Initially, I thought this would be as simple as saying m_first.store(nextof(m_first))
- but unfortunately I don't think that's atomic because we're actually doing a store and a load here on the same memory location - we're storing a new value into m_first
, but also loading the value of m_first
in order to get it's next pointer.
So, it seems we need to do a Compare and Swap. In the above code, I atomically load the value of m_first
into a local variable. Then, after checking for null, I atomically change the value of m_first
with a CAS, if and only if it still equals the value of the local variable. If it doesn't, the local variable is updated to whatever m_first
is now, and we keep looping until we're not preempted by some other thread.
The deallocation is a bit trickier - here we need to change the value of m_first
to the deallocated block, and also update the next pointer on the deallocated block to point to what m_first
used to be.
My attempt at this is:
void free(void* chunk)
{
unsigned char* first = m_first.load();
nextof(chunk) = first;
while (!m_first.compare_exchange_strong(first, static_cast<unsigned char*>(chunk)))
{
nextof(chunk) = first;
}
}
I'm not quite sure I have this right, but I think it's correct. Firstly, I atomically load the value of m_first
and store it in a local variable. I then set the next pointer on the block-to-be-free'd to my local copy of m_first
. Of course, by now, another thread could have intervened and set m_first
to something else - so I have to now do a CAS operation to check that m_first
is still what I expect. If it's not, I have to set the next pointer to the appropriate value, and continue looping.
As far as I can see this is all race-condition safe, with a single CAS operation for malloc
and free
.
Question: Since lock-free algorithms are hard, I'm not certain I have this correct. Am I overlooking something here that could lead to a race condition?
Upvotes: 5
Views: 1034
Reputation: 51214
An old answer, but I thought I might give an opinion. Unless I am mistaken, I believe the code might leak data through a version of the ABA problem.
The problem is the evaluation of the parameter nextof(first)
before passing it to the atomic compare_exchange_strong
inside malloc
. If we take the while
loop from the malloc
function:
while (!m_first.compare_exchange_strong(first, nextof(first)))
{
if (!first) return nullptr;
}
and rewrite it to explicitly evaluate nextof(first)
before passing to compare_exchange_strong
, to make it more obvious:
while (true)
{
auto tmp_next = nextof(first);
// this is where the chance for ABA exists
if (m_first.compare_exchange_strong(first, tmp_next))
break;
if (!first) return nullptr;
}
Then there seems to be a possibility for m_first
to change to a different value and back between these two lines, with another thread mutating actual nextof(m_first)
in the meanwhile:
// I will represent the linked list using letters,
// i.e. m_first -> "A" means that "A" is at the head,
// and "A" -> "B" means that "A" points to "B"
void* malloc()
{
// at the start, m_first points to "A": m_first -> "A" -> "B" -> "C"
unsigned char* first = m_first.load();
// --> thread #2 calls malloc() here and acquires "A", and
// removes it, so: m_first -> "B" -> "C"
if (!first) return nullptr;
while (true)
{
unsigned char * tmp_next = nextof(first);
// tmp_next is "B" at this moment, because 'first' is still "A"
// --> some third thread returns some even older object now,
// so m_first -> "M" -> "B" -> "C"
// --> thread #2 now calls free() to return "A", resulting in
// my_first -> "A" -> "M" -> "B" -> "C"
// m_first is now back to "A", but tmp_next points
// to old "B" instead of "M", and the following line succeeds
// and "M" is lost forever: m_first -> "B" -> "C"
if (m_first.compare_exchange_strong(first, tmp_next))
break;
if (!first) return nullptr;
}
return first;
}
The Wikipedia article on ABA contains some suggestions for solving these, like "tagged state reference" (basically adding something like a "transaction counter" to these pointers so that you can detect that the second "A" is not the same as the first one), or "intermediate nodes" which I find slightly more complex to implement (apparently invented by John D. Valois in this article).
Upvotes: 0