Reputation: 215577
I'm having a horrible time trying to debug a race-condition-causing-deadlock in my Linux-futex & atomic-ops based locking primitives. Here's the code I'm working with (exact same logic as the real code, just pulled out the dependency on data structures that aren't relevant to the problem):
int readers, writer, waiting;
void wrlock()
{
int cur;
while (atomic_swap(&writer, 1)) spin();
while ((cur=readers)) futex_wait(&readers, cur);
}
void wrunlock()
{
atomic_store(&writer, 0);
if (waiting) futex_wake(&writer, ALL);
}
void rdlock()
{
atomic_inc(&waiting);
for (;;) {
atomic_inc(&readers);
if (!writer) return;
atomic_dec(&readers);
futex_wait(&writer, 1);
}
}
void rdunlock()
{
atomic_dec(&waiting);
atomic_dec(&readers);
if (writer) futex_wake(&readers, ALL);
}
The atomic_*
and spin
functions are pretty obvious. Linux futex ops are futex_wait(int *mem, int val)
and futex_wake(int *mem, int how_many_to_wake)
.
The deadlock condition I'm running into is 3 threads, readers==0
, writer==1
, waiting==2
, and all threads waiting on futex_wait
. I don't see how this can happen.
And for everyone who wants to yell at me for not using pthread primitives, please save it for another question. This is low-level code which operates without dependence on glibc/libpthread. In any case I think the question is probably useful to others for learning about low-level concurrency black magic, or maybe to scare anyone else away from trying to write code like this... ;-)
By the way, the hardware is x86, so even if there are memory-ordering issues with the code, I don't think they'd manifest as bugs. I'm guessing there's just a subtle misuse of futexes I'm missing, especially since the code works fine when all the waits are dummied out as spins.
Here's the generated asm for wrlock
(basically identical to the version I posted except it's calling a separate function lock
for the first spinlock). The additional conditional return at the beginning is "if we're not running multiple threads, bail out". 0x338
and 0x33c
correspond to readers
and writer
. call 1af
is actually a relocation to call futex_wait
which is external.
00000184 <wrlock>:
184: a1 18 00 00 00 mov 0x18,%eax
189: 55 push %ebp
18a: 85 c0 test %eax,%eax
18c: 89 e5 mov %esp,%ebp
18e: 75 02 jne 192 <wrlock+0xe>
190: c9 leave
191: c3 ret
192: 68 3c 03 00 00 push $0x33c
197: e8 7e fe ff ff call 1a <lock>
19c: 58 pop %eax
19d: a1 38 03 00 00 mov 0x338,%eax
1a2: 85 c0 test %eax,%eax
1a4: 74 ea je 190 <wrlock+0xc>
1a6: 6a 01 push $0x1
1a8: 50 push %eax
1a9: 68 38 03 00 00 push $0x338
1ae: e8 fc ff ff ff call 1af <wrlock+0x2b>
1b3: a1 38 03 00 00 mov 0x338,%eax
1b8: 83 c4 0c add $0xc,%esp
1bb: 85 c0 test %eax,%eax
1bd: 75 e7 jne 1a6 <wrlock+0x22>
1bf: eb cf jmp 190 <wrlock+0xc>
Upvotes: 2
Views: 437
Reputation: 340466
I think this illustrates your potential deadlock. Assume a single processor executing your 3 threads in the following sequence:
// to start,
// readers == 0, writer == 0, waiting == 0
Reader1
===================================
// in rdlock()
atomic_inc(&waiting);
for (;;) {
atomic_inc(&readers);
// if (!writer) has not been executed yet
// readers == 1, writer == 0, waiting == 1
writer
===================================
// in wrlock()
while (atomic_swap(&writer, 1)) spin();
while ((cur=readers)) futex_wait(&readers, cur)
// writer thread is waiting
// readers == 1, writer == 1, waiting == 1
// cur == 1
Reader1
===================================
// back to where it was in rdlock()
if (!writer) return;
atomic_dec(&readers);
futex_wait(&writer, 1);
// Reader1 is waiting
// readers == 0, writer == 1, waiting == 1
Reader2
===================================
// in rdlock()
atomic_inc(&waiting);
for (;;) {
atomic_inc(&readers);
if (!writer) return;
atomic_dec(&readers);
futex_wait(&writer, 1);
// Reader2 is waiting
// readers == 0, writer == 1, waiting == 2
Now you're deadlocked.
Of course, I might not be understanding how the futex API works (I've never used them), so let me know if I'm off base here. I assume that a futex_wait()
that blocks (because the expected value was correct) will not unblock until there's a futex_wake()
call for that address.
If atomic_xxx()
operations can unblock a futex_wait()
, this analysis is incorrect.
Finally, if this is what's happening to you, I haven't had a chance to think about possible solutions...
Upvotes: 5
Reputation: 5507
My guess is that this is a memory ordering issue. I don't know the x86 memory model that well, but I strongly suspect you need a memory fence around your futex_*
calls. I understand that x86 guarantees that one core will update the memory contents of other cores in the same order as memory cells are written, but it seems that you are relying on a stronger assumption - that the writes on one core would be immediately visible to others. For example, say that core A has a rdlock
and has just execute rdunlock
. Now, it clears both waiting
and readers
, but this information hasn't made it to core B by the time core B attempts wrlock
. Core B successfully acquires writer
, but sees that there are extant readers
. The update to writer
has not posted to core A by the point at which rdunlock
checks if it needs to futex_wake(&readers)
, so it does not. This could exhibit your symptoms, and it would also have the property that it would recover if the futex_*
calls were replaced by simple spins. Does this make sense to you?
Hmm, while ((cur=readers)) futex_wait(&readers, cur);
should be while ((cur==readers))...
?
Upvotes: 0