Benny
Benny

Reputation: 4321

Thread synchronization issue: possible race, misuse of volatile, cache coherency?

I am getting acquainted with pthreads programming; the following code is a simple producer/consumer design where data is put/fetched from a global list. The problem is: the data pointer in the consumer function is tried to be freed twice. Beside, if I add a printf() at the beginning of the loop, everything seems ok... What am I doing wrong? I suspect a misuse of the volatile keyword, or something hidden by the cache... Unless it's just a design issue (which probably is :p).

Thanks for your insights.

Note: malloc()/free() is thread-safe on my system. I am compiling with $ gcc -pthread -O0 which should, I guess, reduce possible design errors due to misuse of volatile. Finally, we don't care in this code snippet with running out of memory (in case of more producer than consumer).

Edit: Changed code to a single list head.

#include <pthread.h>
#include <stdlib.h>
#include <string.h>

pthread_mutex_t lock;
pthread_cond_t new_data;

struct data {
  int i;
  struct data *next;
};

struct data *list_head = NULL;

void *consume(void *arg)
{
  struct data *data;

  while (1) {
    pthread_mutex_lock(&lock);
    while (list_head == NULL) {
      pthread_cond_wait(&new_data, &lock);
    }
    data = list_head;
    list_head = list_head->next;
    pthread_mutex_unlock(&lock);
    free(data);
  }

  return NULL;
}

void *produce(void *arg)
{
  struct data *data;

  while (1) {
    data = malloc(sizeof(struct data));
    pthread_mutex_lock(&lock);
    data->next = list_head;
    list_head = data;
    pthread_mutex_unlock(&lock);
    pthread_cond_signal(&new_data);
  }

  return NULL;
}

int main()
{
  pthread_t tid[2];
  int i;

  pthread_mutex_init(&lock, NULL);
  pthread_cond_init(&new_data, NULL);
  pthread_create(&tid[0], NULL, consume, NULL);
  pthread_create(&tid[1], NULL, produce, NULL);
  for (i = 0; i < 2; i++) {
    pthread_join(tid[i], NULL);
  }
}

And the output:

$ ./a.out 
*** glibc detected *** ./a.out: double free or corruption (fasttop): 0x00007f5870109000 ***

Upvotes: 4

Views: 323

Answers (4)

user7116
user7116

Reputation: 64068

Consider the following scenario:

  1. produce -> acquired lock
  2. consume -> wait lock
  3. produce -> allocate d0, write_ptr = d0, read_ptr = d0, signal, unlock
  4. consume -> acquired lock
  5. produce -> wait lock
  6. consume -> satisfied condition
  7. consume -> data = d0, read_ptr = NULL, unlock
  8. consume -> free d0
  9. produce -> acquired lock, allocate d1
  10. consume -> wait lock
  11. produce -> write_ptr != null so write_ptr->next = d1
  12. produce -> read_ptr == null so read_ptr = d1

Check out step 11. write_ptr is still d0 even though it was free'd independently of the producer thread. You need to make sure consume does not invalidate write_ptr.

A doubly linked list would allow you to avoid some of these difficulties (as the readers and writers work from separate ends).

Main:

  • Create sentinel nodes HEAD and TAIL, link HEAD and TAIL
  • Spawn producer
  • Spawn consumer

Producer:

  • Lock
  • Create node
  • Link HEAD->next->prev to node
  • Link node->next to HEAD->next
  • Link HEAD->next to node
  • Link node->prev to HEAD
  • Unlock
  • Signal

Consumer:

  • Lock
  • Wait for TAIL->prev != HEAD (do { pthread_cond_wait } while (condition);)
  • data = TAIL->prev
  • Link TAIL->prev to data->prev
  • Link TAIL->prev->next to TAIL
  • Unlock
  • Use data

Upvotes: 4

doron
doron

Reputation: 28872

The error is in the lines

if (NULL != write_ptr)
   write_ptr->next = data;
write_ptr = data;

This should read: if (NULL != write_ptr) data->next = write_ptr; write_ptr = data;

For debugging purposes also make sure that you are getting your expected values out the queue.

Also there is no need for the volatile variables. since your queue is protected by mutexes, the operating system will insure that queue access is atomic. volatile is only needed when accessing memory mapped hardware resources and should never be used for synchronization. All it does is forces data to memory unnecessarily.

There is also one other issue. If your producer is faster than your consumer, you will eventually run out of memory unless you limit the size of the queue and force the producer to wait for the consumer. Smaller queues are more responsive and the only reason to go for larger queues is to smooth out perturbations in the producer.

Upvotes: 1

junix
junix

Reputation: 3211

Update: As Billy ONeal pointed out, pthread functions provide the needed memory barriers, so it's not necessary to declare anything volatile as long as it's protected by pthread locks. (See this question for details: Does guarding a variable with a pthread mutex guarantee it's also not cached?)

But I got another explaination for some odd behavior: Your linked list the producer creates is broken: Assume write_ptr is not NULL and observe the behavior:

/* 1 */ if (NULL != write_ptr)
/* 2 */   write_ptr->next = data;
/* 3 */ write_ptr = data;

Say write_ptr points to previously allocated instance A. A.next is NULL. We newly allocate an instance B and set everything to NULL (therefore: B.next is NULL).

Line 1 evaluates true, therefore line 2 is executed. A.next is now pointing to B. After executing line 3, write_ptr points to B B.next is NULL => A is lost what results into a memory leak.

But for now I can't see why the libc complains about double free.

Upvotes: 1

Zan Lynx
Zan Lynx

Reputation: 54325

I believe that your problem is in the following lines:

if (NULL != write_ptr)
  write_ptr->next = data;
write_ptr = data;
if (NULL == read_ptr)
  read_ptr = data;

I don't think you are building your list correctly. In fact, I don't understand why you have two lists. But anyway...

I assume that you want to add your new data to the head of the list. Otherwise you would need a tail pointer or you would need to chase to the end of the list each time.

To do this you need to add the current list head as the next item of your data. It should look like:

data->next = write_ptr;
write_ptr = data;

There is no need for a NULL check on write_ptr.

Upvotes: 4

Related Questions