pythonic
pythonic

Reputation: 21615

Is this enough to detect race conditions?

Say I have a multithreaded application and I run it with the same inputs. Is it enough to instrument every load and stores to detect write-write and write-read data races? I mean from the logged load and store addresses, if we can see which thread did which load and which thread did which store, we can detect write-read and write-write data race by noticing the overlapped addresses. Or am I missing something?

Upvotes: 1

Views: 203

Answers (3)

Employed Russian
Employed Russian

Reputation: 213526

Or am I missing something?

You are missing a lot. As Pubby said, if you see read, then write in T1, and later read, then write in T2, you can't say anything about absence of races. You need to know about locks involved.

You may want to use a tool, such as Google's ThreadSanitizer instead.

Update:

But will my approach cover all races or at least some of the races?

Your comments here and on other answers appear to show that you don't understand what a race is.

Your approach may expose some of the races, yes. It is guaranteed to not cover most of them (which will make the exercise futile).

Upvotes: 5

Pubby
Pubby

Reputation: 53047

Here is a simple example from Wikipedia that I have slightly modified:

As a simple example let us assume that two threads T1 and T2 each want to perform arithmetic on the value of a global integer by one. Ideally, the following sequence of operations would take place:

  1. Integer i = 0; (memory)
  2. T1 reads the value of i from memory into register1: 0
  3. T1 increments the value of i in register1: (register1 contents) + 1 = 1
  4. T1 stores the value of register1 in memory: 1
  5. T2 reads the value of i from memory into register2: 1
  6. T2 multiplies the value of i in register2: (register2 contents) * 2 = 2
  7. T2 stores the value of register2 in memory: 2
  8. Integer i = 2; (memory)

In the case shown above, the final value of i is 2, as expected. However, if the two threads run simultaneously without locking or synchronization, the outcome of the operation could be wrong. The alternative sequence of operations below demonstrates this scenario:

  1. Integer i = 0; (memory)
  2. T1 reads the value of i from memory into register1: 0
  3. T2 reads the value of i from memory into register2: 0
  4. T1 increments the value of i in register1: (register1 contents) + 1 = 1
  5. T2 multiplies the value of i in register2: (register2 contents) * 2 = 0
  6. T1 stores the value of register1 in memory: 1
  7. T2 stores the value of register2 in memory: 0
  8. Integer i = 0; (memory)

The final value of i is 0 instead of the expected result of 2. This occurs because the increment operations of the second case are not mutually exclusive. Mutually exclusive operations are those that cannot be interrupted while accessing some resource such as a memory location. In the first case, T1 was not interrupted while accessing the variable i, so its operation was mutually exclusive.

All of these operations are atomic. The race condition occurs because this certain order does not have the same semantics as the first. How do you prove the semantics are not the same as the first? Well, you know they are different for this case, but you need to prove every possible order to determine you have no race conditions. This is a very hard thing to do and has an immense complexity (probably NP-hard or requiring AI-complete) and thus can't be checked reliably.

What happens if a certain order never halts? How do you even know it will never halt in the first place? You're basically left with solving the halting problem which is an impossible task.

If you're talking about using consecutive reads or writes to determine the race, then observe this:

  1. Integer i = 0; (memory)
  2. T2 reads the value of i from memory into register2: 0
  3. T2 multiplies the value of i in register2: (register2 contents) * 2 = 0
  4. T2 stores the value of register2 in memory: 0
  5. T1 reads the value of i from memory into register1: 0
  6. T1 increments the value of i in register1: (register1 contents) + 1 = 1
  7. T1 stores the value of register1 in memory: 1
  8. Integer i = 1; (memory)

This has the same read/store pattern as the first but gives different results.

Upvotes: 2

Klaas van Gend
Klaas van Gend

Reputation: 1128

The most obvious thing you'll learn is that there are several threads using the same memory. That's not necessarily bad in itself.

Good uses would include protection by semaphores, atomic access and mechanisms like RCU or double buffering.

Bad uses would include races conditions, true and false sharing:

  • Race conditions mostly stem from ordering issues - if a certain task A writes something at the end of its execution whereas task B needs that value at its start, you better make sure that the read of B only happens after A is completed. Semaphores, signals or similar are a good solution to this. Or run it in the same thread of course.
  • True sharing means that two or more cores are aggressively reading and writing the same memory address. This slows down the processor as it will constantly have to send any changes to the caches of the other cores (and the memory of course). YOur approach could catch this, but probably not highlight it.
  • False sharing is even more complex than true sharing: processor caches do not work on single bytes but on "cache lines" - which hold more than one value. If core A keeps hammering byte 0 of a line whereas core B keeps writing to byte 4, the cache updating will still stall the whole processor.

Upvotes: 1

Related Questions