Reputation: 38217

Are there compiler optimization issues with sharing variables between threads?

Consider the following example. The goal is to use two threads, one to "compute" a value and one that consumes and uses the computed value (I've tried to simplify this). The computing thread signals to the other thread that the value has been computed and is ready by using a condition variable, after which the waiting thread consumes the value.

// Hopefully this is free from errors, if not, please point them out so I can fix
// them and we can focus on the main question
#include <pthread.h>
#include <stdio.h>

// The data passed to each thread. These could just be global variables.
typedef struct ThreadData
{
  pthread_mutex_t mutex;
  pthread_cond_t cond;
  int spaceHit;
} ThreadData;

// The "computing" thread... just asks you to press space and checks if you did or not
void* getValue(void* td)
{
  ThreadData* data = td;

  pthread_mutex_lock(&data->mutex);

  printf("Please hit space and press enter\n");
  data->spaceHit = getchar() == ' ';
  pthread_cond_signal(&data->cond);

  pthread_mutex_unlock(&data->mutex);

  return NULL;
}

// The "consuming" thread... waits for the value to be set and then uses it
void* watchValue(void* td)
{
  ThreadData* data = td;

  pthread_mutex_lock(&data->mutex);
  if (!data->spaceHit)
      pthread_cond_wait(&data->cond, &data->mutex);
  pthread_mutex_unlock(&data->mutex);

  if (data->spaceHit)
      printf("You hit space!\n");
  else
    printf("You did NOT hit space!\n");

  return NULL;
}

int main()
{
  // Boring main function. Just initializes things and starts the two threads.
  pthread_t threads[2];
  pthread_attr_t attr;
  ThreadData data;
  data.spaceHit = 0;

  pthread_mutex_init(&data.mutex, NULL);
  pthread_cond_init(&data.cond, NULL);

  pthread_attr_init(&attr);
  pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
  pthread_create(&threads[0], &attr, watchValue, &data);
  pthread_create(&threads[1], &attr, getValue, &data);

  pthread_join(threads[0], NULL);
  pthread_join(threads[1], NULL);

  pthread_attr_destroy(&attr);
  pthread_mutex_destroy(&data.mutex);
  pthread_cond_destroy(&data.cond);

  return 0;
}

My main question has to do with potential optimizations done by the compiler. Is the compiler allowed to do tricky optimizations and "optimize" the program flow such that the following happens:

void* watchValue(void* td)
{
  ThreadData* data = td;

  pthread_mutex_lock(&data->mutex);
  if (!data->spaceHit) // Here, it might remember the result of data->spaceHit
      pthread_cond_wait(&data->cond, &data->mutex);
  pthread_mutex_unlock(&data->mutex);

  if (remember the old result of data->spaceHit without re-getting it)
      printf("You hit space!\n");
  else
    printf("You did NOT hit space!\n");
  // The above if statement now may not execute correctly because it didn't
  // re-get the value of data->spaceHit, but "remembered" the old result
  // from the if statement a few lines above

  return NULL;
}

I'm a bit paranoid that the compiler's static analysis might determine that data->spaceHit does not change between the two if statements, and thus justify using the old value of data->spaceHit instead of re-getting the new value. I don't know enough about threading and compiler optimizations to know if this code is safe or not. Is it?

Note: I've written this in C, and tagged this as C and C++. I'm using this in a C++ library, but since I'm using C threading APIs (pthreads and Win32 threads) and have the option to embed C in this portion of the C++ library, I've tagged this as both C and C++.

Upvotes: 11

Answers (3)

Kaz

Reputation: 58500

Generally speaking, there are not only compiler optimization issues with sharing data between threads, but hardware optimization issues when those threads are on different processors which can execute instructions out of order.

However, the pthread_mutex_lock and pthread_mutex_unlock functions must defeat not only compiler caching optimizations, but also any hardware reordering optimizations also. If a thread A prepares some shared data and then "publishes" it by performing an unlock, this must appear consistent to other threads. For example, it cannot appear on another processor that the lock is released, but the updates of the shared variables did not yet complete. So the functions have to execute any necessary memory barriers. All of that would be for naught if the compiler could move data accesses around the calls to the functions, or cache things at the register level such that coherency is broken.

So the code you have is safe from that perspective. However, it has other issues. The pthread_cond_wait function should always be called in a loop which re-tests the variable, because of the possibility of spurious wakeup for any reason.

The signaling of a condition is stateless, so the waiting thread can block forever. Just because you call pthread_cond_signal unconditionally in the getValue input thread doesn't mean that watchValue will fall through the wait. It's possible that getValue executes first, and spaceHit is no set. Then watchValue enters into the mutex, sees that spaceHit is false and executes a wait which may be indefinite. (The only thing that will save it is a spurious wakeup, ironically, because there is no loop.)

Basically the logic you seem to be looking for is a simple semaphore:

// Consumer:
wait(data_ready_semaphore);
use(data);

// Producer:
data = produce();
signal(data_ready_semaphore);

In this style of interaction, we do not need a mutex, which is hinted at by the unprotected use of data->spaceHit in your watchValue. More concretely, with POSIX semaphore syntax:

// "watchValue" consumer
sem_wait(&ready_semaphore);
if (data->spaceHit)
  printf("You hit space!\n");
else
  printf("You did NOT hit space!\n");

// "getValue" producer
data->spaceHit = getchar() == ' ';
sem_post(&ready_semaphore);

Perhaps the real code that you simplified to the example can just use semaphores.

P.S. also pthread_cond_signal need not be inside the mutex. It potentially calls into the operating system, so a mutex-protected region that only needs to be a handful of machine instructions long just to protect the shared variables can blow up to hundreds of machine cycles because it contains the signaling call.

Upvotes: 6

chux

Reputation: 153338

(Later Edit: It looks like subsequent answers have provided a better answer to this query. I'll leave this answer here as a reference of an approach that does not answer the question as well. Advise should you recommend a different approach.)

Type ThreadData itself is not volatile.

The instantiation of it as 'data' in main() is volatile. The pointers 'data' in getValue() and watchValueValue() point to volatile versions of a 'ThreadData' type also.

Although I like the first answer for its tightness, rewriting

ThreadData data;  // main()
ThreadData* data; // getValue(), watchValueValue()

volatile ThreadData data;  // main()
volatile ThreadData* data; // getValue(), watchValueValue()
                           // Pointer `data` is not volatile, what it points to is volatile.

may be better. It will insure any access to members of ThreadData are always re-read and not optimized. Should you add additional fields to ThreadData, there are likewise protected.

Upvotes: -1

caf

Reputation: 239011

No, the compiler is not allowed to cache the value of data->spaceHit across the calls to pthread_cond_wait() or pthread_mutex_unlock(). These are both specifically called out as "functions [which] synchronize memory with respect to other threads", which must necessarily act as compiler barriers.

For a compiler to be part of a conforming pthreads implementation, it must not perform that optimisation in the case you've given.

Upvotes: 9

Are there compiler optimization issues with sharing variables between threads?

Answers (3)

Related Questions