Ta Thanh Dinh
Ta Thanh Dinh

Reputation: 638

Concurrent threads and data race

I'm reading a paper of Boehm: "Threads Cannot be Implemented as a Library" but struggling with an example about why library-based solution for thread does not help.

The example gives an initialization zero for both x and y then there are two threads

// thread 0
if (x == 1) ++y;

// thread 1
if (y == 1) ++x;

I understand that there is no data race because, under sequential consistency, any thread will always see each other as the comparison (i.e. data load) occurs before the assignment (i.e. data store).

So there is no way, for example, the following situation occurs:

// thread 0 assigns y
++y;            // that means x is already 1

// thread 1 is checking
if (y == 1)     // that means x is still 0

Consequently, so the outcome x = 0; y = 0 is warranted.

However a compiler, which awares single-threaded code only, is free to reorder these threads as

// thread 0
++y; if (x != 1) --y;

// thread 1
++x; if (y != 1) --x;

then there is data race, for example

// thread 0 stores y
--y;

// thread 1 load y
if (y != 1)

But I still do not understand why a thread library cannot help, the most trivial implementation that I can think of is simply using a mutex to avoid data race:

int x, y = 0;
mtx_t m;

int thread0(void *arg)
{
  mtx_lock(&m); if (x == 1) ++y; mtx_unlock(&m);
  return y;
}

int thread1(void *arg)
{
  mtx_lock(&m); if (y == 1) ++x; mtx_unlock(&m);
  return x;
}

int main()
{
  thrd_t t0, t1;
  int r0, r1;

  mtx_init(&m, mtx_plain);

  thrd_create(&t0, thread0, NULL);
  thrd_create(&t1, thread1, NULL);

  thrd_join(t0, &r0);
  thrd_join(t1, &r1);

  printf("x = %d, y = %d\n", r1, r0);

  return 0;
}

There is a similar question, and the answer is about circular definition of data-race in pthreads. But with the implementation above, I don't care about whatever the memory model is, the compiler is free to reorder the code if it wants, but the output x = 0, y = 0 is always warranted.

Do I misunderstand anything?

Upvotes: 1

Views: 176

Answers (1)

mevets
mevets

Reputation: 10445

While it is likely that you misunderstand some thing, your analysis is fine. The paper referenced overplays its thesis.

The UNIX and Linux kernels (amongst many others) themselves are large multi-threaded programs which operate with only (the equivalent of) library-based thread support. These large multi-threaded programs have exhibited a shocking performance, reliability and scalability from tiny pdp-based computers to massive supercomputers.

A Java-based OS was produced by Sun Labs to the open ridicule of all who had the opportunity to give it a whirl. It remains in an unmarked grave.

The secondary line of reasoning, that busy-waiting is more effective than locking primitives had been kicking around for at least a decade before that paper. Everybody loves lockless because it makes great benchmarks, whereas unbounded non-deterministic race condition scares people who want nice safe systems. The thing is, sometimes racy compare-and-swap (CAS) is good, sometimes it is bad. Some clever systems use an optimistic CAS to implement mutexes, leaving to opportunity for somewhat readable code, and good benchmarks.

Again, the bold statement of impossibility is hyperbolic, based on the idea that the compiler is capricious, so will make menacing assumptions and over-write memory at will. Thankfully, the “free and good enough” technologies slew these dragons.

Upvotes: 3

Related Questions