Mike Sweeney
Mike Sweeney

Reputation: 2046

Memory Barriers in Single Producer Single Consumer Queue

I've spent the last few weeks doing a lot of reading on memory models, compiler reordering, CPU reordering, memory barriers, and lock-free programming and I think I've driven myself into confusion now. I've written a single producer single consumer queue and am trying to figure out where I need memory barriers for things to work and if some operations need to be atomic. My single producer single consumer queue is as follows:

typedef struct queue_node_t {
    int data;
    struct queue_node_t *next;
} queue_node_t;

// Empty Queue looks like this:
// HEAD TAIL
//   |    |
// dummy_node

// Queue: insert at TAIL, remove from HEAD
//    HEAD           TAIL
//     |              |
// dummy_node -> 1 -> 2 -> NULL

typedef struct queue_t {
    queue_node_t *head; // consumer consumes from head
    queue_node_t *tail; // producer adds at tail
} queue_t;

queue_node_t *alloc_node(int data) {
    queue_node_t *new_node = (queue_node_t *)malloc(sizeof(queue_node_t));
    new_node->data = data;
    new_node->next = NULL;
    return new_node;
}

queue_t *create_queue() {
    queue_t *new_queue = (queue_t *)malloc(sizeof(queue_t));
    queue_node_t *dummy_node = alloc_node(0);
    dummy_node->next = NULL;
    new_queue->head = dummy_node;
    new_queue->tail = dummy_node;
    // 1. Do we need any kind of barrier to make sure that if the
    // thread that didn't call this performs a queue operation
    // and happens to run on a different CPU that queue structure
    // is fully observed by it? i.e. the head and tail are properly
    // initialized
    return new_queue;
}

// Enqueue modifies tail
void enqueue(queue_t *the_queue, int data) {
    queue_node_t *new_node = alloc_node(data);
    // insert at tail
    new_node->next = NULL;

    // Let us save off the existing tail
    queue_node_t *old_tail = the_queue->tail;

    // Make the new node the new tail
    the_queue->tail = new_node;

    // 2. Store/Store barrier needed here?

    // Link in the new node last so that a concurrent dequeue doesn't see
    // the node until we're done with it
    // I don't know that this needs to be atomic but it does need to have
    // release semantics so that this isn't visible until prior writes are done
    old_tail->next = the_queue->tail;
    return;
}

// Dequeue modifies head
bool dequeue(queue_t *the_queue, int *item) {
    // 3. Do I need any barrier here to make sure if an enqueue already happened
    // I can observe it? i.e., if an enqueue was called on 
    // an empty queue by thread 0 on CPU0 and dequeue is called
    // by thread 1 on CPU1
    // dequeue the oldest item (FIFO) which will be at the head
    if (the_queue->head->next == NULL) {
        return false;
    }
    *item = the_queue->head->next->data;
    queue_node_t *old_head = the_queue->head;
    the_queue->head = the_queue->head->next;
    free(old_head);
    return true;
}

Here are my questions corresponding to the comments in my code above:

  1. In create_queue() do I need some kind of barrier before I return? I'm wondering if I call this function from thread 0 running on CPU0 and then use the pointer returned in thread 1 which happens to run on CPU1 is it possible thread 1 sees a queue_t structure that isn't fully initialized?
  2. Do I need a barrier in enqueue() to make sure the new node isn't linked in to the queue until all of the new node's fields are initialized?
  3. Do I need a barrier in dequeue()? I feel like it would be correct without one but I may need one if I want to make sure I see any completed enqueue.

Update: I tried to make it clear with the comments in the code but the HEAD of this queue always points to a dummy node. This is a common technique that makes it so that the producer only ever needs to access the TAIL and the consumer only ever accesses the HEAD. An empty queue will contain a dummy node and dequeue() always returns the node after the HEAD, if there is one. As nodes are dequeued the dummy node advances and the previous "dummy" is freed.

Upvotes: 2

Views: 489

Answers (1)

Domso
Domso

Reputation: 970

first of all, it depends your specific hardware-architecture, OS, language, etc.

1.) no. because you need a additional barrier to pass the pointer to the other thread anyway

2.) yes, old_tail->next = the_queue->tail needs to be executed after the_queue->tail = new_node

3.) It would have no effect, since there is nothing before the barrier, but theoretically you could need a barrier after old_tail->next = the_queue->tail in enqueue(). The compiler wont reorder outside of a function, but the CPU could maybe do something like that. (very unlikely, but not 100% sure)

OT: since you are already doing some micro-optimization, you could add some padding for the cache

typedef struct queue_t {
    queue_node_t *head; // consumer consumes from head
    char cache_pad[64]; // head and tail shouldnt be in the same cache-line(->64 Byte)
    queue_node_t *tail; // producer adds at tail
} queue_t;

and if you have really enough memory to waste, you could do something like this

typedef struct queue_node_t {
    int data;
    struct queue_node_t *next;
    char cache_pad[56]; // sizeof(queue_node_t) == 64; only for 32Bit
} queue_node_t;

Upvotes: 0

Related Questions