Pierre Abbat
Pierre Abbat

Reputation: 523

How do I debug a deadlock of readers-writer locks?

I'm writing a program that has one thread that reads a file of points into a buffer and many threads that take points from the buffer and construct an octree of them. Each cube of the octree is protected by a readers-writer lock (aka shared_mutex), of which there are 67 (if there are two threads, which there are now). If the file is too big, the program deadlocks, and I'm stumped trying to debug it. One of the locks looks like this in gdb:

[6] = {_M_impl = {_M_rwlock = {__data = {__readers = 1, 
          __writers = 0, __wrphase_futex = 1, __writers_futex = 0, __pad3 = 0, 
          __pad4 = 0, __cur_writer = 0, __shared = 0, __rwelision = 0 '\000', 
          __pad1 = "\000\000\000\000\000\000", __pad2 = 0, __flags = 0}, 
        __size = "\001\000\000\000\000\000\000\000\001", '\000' <repeats 46 times>, __align = 1}}},

Most of the mutexes have __readers=1, one has __readers=3, and one has __readers=4294967289 or so. This makes no sense, as there are only two threads, so only two can be reading them; in the building-octree stage they should be write-locking not read-locking the mutexes, and the -7 looks like seven threads have read-unlocked a mutex without first read-locking it. Trying to set a watchpoint on __readers doesn't work; it crashes the debugger, or something like that.

I wrote a wrapper around the lockings and unlockings:

void lockBlockR(int block)
{
  metaMutex.lock();
  modReaders[block%modMutexSize]++;
  metaMutex.unlock();
  modMutex[block%modMutexSize].lock_shared();
}

void lockBlockW(int block)
{
  modMutex[block%modMutexSize].lock();
}

void unlockBlockR(int block)
{
  metaMutex.lock();
  if (--modReaders[block%modMutexSize]<0)
    cout<<"Read-unlocked "<<block<<" too many times\n";
  metaMutex.unlock();
  modMutex[block%modMutexSize].unlock_shared();
}

void unlockBlockW(int block)
{
  modMutex[block%modMutexSize].unlock();
}

When the program hung, I looked at modReaders and it's all zeros, then at modMutex and it again has most of the __readers=1 and one negative. How do I figure out what's going on?

I'm running Eoan Ermine, Linux 5.3.0, and libc 2.30. The program is compiled with gcc 9.2.1 in C++17.

I've previously used readers-writer locks and a modulo pool of locks in PerfectTIN (https://github.com/phma/perfecttin), but the locks in the modulo pool are ordinary mutexes.

ETA: I added another map of ints called modWriters and some debugging statements and caught a thread in the act of unlocking a mutex it didn't lock. It was write-locking and write-unlocking, though, so that doesn't explain why __readers was messed up.

Upvotes: 1

Views: 1184

Answers (1)

How do I debug a deadlock of readers-writer locks?

Consider using valgrind, GCC 10 static analysis options, and instrumentation options such as -fsanitize=thread, and Clang static analyzer.

It is worthwhile to build GCC 10 from its source code.

Be aware that it is not always possible to statically and reliably detect all deadlocks (Rice's theorem). Read this draft report. You could have heisenbugs.

Use perhaps the C++ threads library, notably std::lock_guard

You might prefer std::recursive_mutex to std::mutex even if recursive mutexes are slower and heavier (and some people say they should be avoided). My opinion is that they are very often safer.

You could consider using the multi-threading abilities of POCO or Qt or GtkMM libraries.

Be aware of futex(7), the basic blocks of locking on Linux. You could use strace(1) (and pipe(7) for inter-thread communications or synchronization with poll(2); see also eventfd(2))

Upvotes: 5

Related Questions