Reputation: 605
In his excellent treatise on threading in C#, Joseph Albahari proposed the following simple program to demonstrate why we need to use some form of memory fencing around data that is read and written by multiple threads. The program never ends if you compile it in Release mode and free-run it without debugger:
static void Main()
{
bool complete = false;
var t = new Thread(() =>
{
bool toggle = false;
while (!complete) toggle = !toggle;
});
t.Start();
Thread.Sleep(1000);
complete = true;
t.Join(); // Blocks indefinitely
}
My question is, why does the following slightly modified version of the above program no longer block indefinitely??
class Foo
{
public bool Complete { get; set; }
}
class Program
{
static void Main()
{
var foo = new Foo();
var t = new Thread(() =>
{
bool toggle = false;
while (!foo.Complete) toggle = !toggle;
});
t.Start();
Thread.Sleep(1000);
foo.Complete = true;
t.Join(); // No longer blocks indefinitely!!!
}
}
Whereas the following still blocks indefinitely:
class Foo
{
public bool Complete;// { get; set; }
}
class Program
{
static void Main()
{
var foo = new Foo();
var t = new Thread(() =>
{
bool toggle = false;
while (!foo.Complete) toggle = !toggle;
});
t.Start();
Thread.Sleep(1000);
foo.Complete = true;
t.Join(); // Still blocks indefinitely!!!
}
}
As does the following:
class Program
{
static bool Complete { get; set; }
static void Main()
{
var t = new Thread(() =>
{
bool toggle = false;
while (!Complete) toggle = !toggle;
});
t.Start();
Thread.Sleep(1000);
Complete = true;
t.Join(); // Still blocks indefinitely!!!
}
}
Upvotes: 6
Views: 3242
Reputation: 123642
To expand on Eric Petroelje's answer.
If we rewrite the program as follows (the behaviour is identical, but avoiding the lambda function makes it easier to read the dissassembly), we can dissasemble it and see what it actually means to "cache the value of a field in a register"
class Foo
{
public bool Complete; // { get; set; }
}
class Program
{
static Foo foo = new Foo();
static void ThreadProc()
{
bool toggle = false;
while (!foo.Complete) toggle = !toggle;
Console.WriteLine("Thread done");
}
static void Main()
{
var t = new Thread(ThreadProc);
t.Start();
Thread.Sleep(1000);
foo.Complete = true;
t.Join();
}
}
We get the following behaviour:
Foo.Complete is a Field | Foo.Complete is a Property
x86-RELEASE | loops forever | completes
x64-RELEASE | completes | completes
in x86-release, the CLR JIT compiles the while(!foo.Complete) into this code:
Complete is a field:
004f0153 a1f01f2f03 mov eax,dword ptr ds:[032F1FF0h] # Put a pointer to the Foo object in EAX
004f0158 0fb64004 movzx eax,byte ptr [eax+4] # Put the value pointed to by [EAX+4] into EAX (this basically puts the value of .Complete into EAX)
004f015c 85c0 test eax,eax # Is EAX zero? (is .Complete false?)
004f015e 7504 jne 004f0164 # If it is not, exit the loop
# start of loop
004f0160 85c0 test eax,eax # Is EAX zero? (is .Complete false?)
004f0162 74fc je 004f0160 # If it is, goto start of loop
The last 2 lines are the problem. If eax is zero, then it will just sit there in an infinite loop saying "is EAX zero?", without any code ever changing the value of eax!
Complete is a property:
00220155 a1f01f3a03 mov eax,dword ptr ds:[033A1FF0h] # Put a pointer to the Foo object in EAX
0022015a 80780400 cmp byte ptr [eax+4],0 # Compare the value at [EAX+4] with zero (is .Complete false?)
0022015e 74f5 je 00220155 # If it is, goto 2 lines up
This actually looks like nicer code. While the JIT has inlined the property getter (otherwise you'd see some call
instructions going off to other functions) into some simple code to read the Complete
field directly, because it's not allowed to cache the variable, when it generates the loop, it repeatedly reads the memory over and over again, rather than just pointlessly reading the register
in x64-release, the 64 bit CLR JIT compiles the while(!foo.Complete) into this code
Complete is a field:
00140245 48b8d82f961200000000 mov rax,12962FD8h # put 12E12FD8h into RAX. 12E12FD8h is a pointer-to-a-pointer in some .NET static object table
0014024f 488b00 mov rax,qword ptr [rax] # Follow the above pointer; puts a pointer to the Foo object in RAX
00140252 0fb64808 movzx ecx,byte ptr [rax+8] # Add 8 to the pointer to Foo object (it now points to the .Complete field) and put that value in ECX
00140256 85c9 test ecx,ecx # Is ECX zero ? (is the .Complete field false?)
00140258 751b jne 00140275 # If nonzero/true, exit the loop
0014025a 660f1f440000 nop word ptr [rax+rax] # Do nothing!
# start of loop
00140260 48b8d82f961200000000 mov rax,12962FD8h # put 12E12FD8h into RAX. 12E12FD8h is a pointer-to-a-pointer in some .NET static object table
0014026a 488b00 mov rax,qword ptr [rax] # Follow the above pointer; puts a pointer to the Foo object in RAX
0014026d 0fb64808 movzx ecx,byte ptr [rax+8] # Add 8 to the pointer to Foo object (it now points to the .Complete field) and put that value in ECX
00140271 85c9 test ecx,ecx # Is ECX Zero ? (is the .Complete field true?)
00140273 74eb je 00140260 # If zero/false, go to start of loop
Complete is a property
00140250 48b8d82fe11200000000 mov rax,12E12FD8h # put 12E12FD8h into RAX. 12E12FD8h is a pointer-to-a-pointer in some .NET static object table
0014025a 488b00 mov rax,qword ptr [rax] # Follow the above pointer; puts a pointer to the Foo object in RAX
0014025d 0fb64008 movzx eax,byte ptr [rax+8] # Add 8 to the pointer to Foo object (it now points to the .Complete field) and put that value in EAX
00140261 85c0 test eax,eax # Is EAX 0 ? (is the .Complete field false?)
00140263 74eb je 00140250 # If zero/false, go to the start
The 64-bit JIT is doing the same thing for both properties and fields, except when it's a field it's "unrolled" the first iteration of the loop - this basically puts an if(foo.Complete) { jump past the loop code }
in front of it for some reason.
In both cases, it's doing a similar thing to the x86 JIT when dealing with a property:
- It inlines the method to a direct memory read
- It doesn't cache it, and re-reads the value each time
I'm not sure if the 64 bit CLR is not allowed to cache the field value in the register like the 32 bit one does, but if it is, it's not bothering to do so. Perhaps it will in future?
At any rate, this illustrates how the behaviour is platform dependent and subject to change. I hope it helps :-)
Upvotes: 0
Reputation: 60498
In the first example Complete
is a member variable and could be cached in register for each thread. Since you aren't using locking, updates to that variable may not be flushed to main memory and the other thread will see a stale value for that variable.
In the second example, where Complete
is a property, you are actually calling a function on the Foo object to return a value. My guess would be that while simple variables may be cached in registers, the compiler may not always optimize actual properties that way.
EDIT:
Regarding the optimization of automatic properties - I don't think there is anything guaranteed by the specification in that regard. You are essentially banking on whether or not the compiler/runtime will be able to optimize out the getter/setter or not.
In the case where it is on the same object, it seems like it does. In the other case, it seems like it does not. Either way, I wouldn't bet on it. The easiest way to solve this would be to use a simple member variable and mark is as volotile
to ensure that it is always synced with main memory.
Upvotes: 7
Reputation: 51329
The other answers explain what happens in technically correct terms. Let me see if I can explain it in english.
The first example says "Loop until this variable location is true." The new thread creates a copy of that variable location (because it is a value type) and proceeds to loop forever. If the variable had happened to be a reference type, it would have made a copy of the reference, but since the reference happened to point to the same memory location it would have worked.
The second example says "Loop until this method (the getter) returns true." The new thread cannot create a copy of a method, so it creates a copy of the reference to the instance of the class in question, and repeatedly calls the getter on that instance until it returns true (repeatedly reading the same variable location that is set to true in the main thread).
The third example is the same as the first. The fact that the closed variable happens to be a member of another class instance is not relevant.
Upvotes: 3
Reputation: 41236
This is because in the first snippet you provided, you made a lambda expression that closed over the boolean value complete
- so, when the compiler rewrites that, it captures a copy of the value, not a reference. Likewise, in the second one, it's capturing a reference instead of a copy, due to closing over the Foo
object, and thus when you change the underlying value, the change is noticed because of the reference.
Upvotes: 5