How do I debug a deadlock of readers-writer locks?

Question

I'm writing a program that has one thread that reads a file of points into a buffer and many threads that take points from the buffer and construct an octree of them. Each cube of the octree is protected by a readers-writer lock (aka shared_mutex), of which there are 67 (if there are two threads, which there are now). If the file is too big, the program deadlocks, and I'm stumped trying to debug it. One of the locks looks like this in gdb:

[6] = {_M_impl = {_M_rwlock = {__data = {__readers = 1, 
          __writers = 0, __wrphase_futex = 1, __writers_futex = 0, __pad3 = 0, 
          __pad4 = 0, __cur_writer = 0, __shared = 0, __rwelision = 0 '\000', 
          __pad1 = "\000\000\000\000\000\000", __pad2 = 0, __flags = 0}, 
        __size = "\001\000\000\000\000\000\000\000\001", '\000' , __align = 1}}},

Most of the mutexes have __readers=1, one has __readers=3, and one has __readers=4294967289 or so. This makes no sense, as there are only two threads, so only two can be reading them; in the building-octree stage they should be write-locking not read-locking the mutexes, and the -7 looks like seven threads have read-unlocked a mutex without first read-locking it. Trying to set a watchpoint on __readers doesn't work; it crashes the debugger, or something like that.

I wrote a wrapper around the lockings and unlockings:

void lockBlockR(int block)
{
  metaMutex.lock();
  modReaders[block%modMutexSize]++;
  metaMutex.unlock();
  modMutex[block%modMutexSize].lock_shared();
}

void lockBlockW(int block)
{
  modMutex[block%modMutexSize].lock();
}

void unlockBlockR(int block)
{
  metaMutex.lock();
  if (--modReaders[block%modMutexSize]<0)
    cout<<"Read-unlocked "<


When the program hung, I looked at modReaders and it's all zeros, then at modMutex and it again has most of the __readers=1 and one negative. How do I figure out what's going on?
I'm running Eoan Ermine, Linux 5.3.0, and libc 2.30. The program is compiled with gcc 9.2.1 in C++17.
I've previously used readers-writer locks and a modulo pool of locks in PerfectTIN (https://github.com/phma/perfecttin), but the locks in the modulo pool are ordinary mutexes.
ETA: I added another map of ints called modWriters and some debugging statements and caught a thread in the act of unlocking a mutex it didn't lock. It was write-locking and write-unlocking, though, so that doesn't explain why __readers was messed up.

How do I debug a deadlock of readers-writer locks?

Answers (1)

Related Questions