Lærne
Lærne

Reputation: 3142

How to debug a rare deadlock?

I'm trying to debug a custom thread pool implementation that has rarely deadlocks. So I cannot use a debugger like gdb because I have click like 100 times "launch" debugger before having a deadlock.

Currently, I'm running the threadpool test in an infinite loop in a shell script, but that means I cannot see variables and so on. I'm trying to std::cout data, but that slow down the thread and reduce the risk of deadlocks meaning that I can wait like 1hour with my infinite before getting messages. Then I don't get the error, and I need more messages, which means waiting one more hour...

How to efficiently debug the program so that its restart over and over until it deadlocks ? (Or maybe should I open another question with all the code for some help ?)

Thank you in advance !

Bonus question : how to check everything goes fine with a std::condition_variable ? You cannot really tell which thread are asleep or if a race condition occurs on the wait condition.

Upvotes: 20

Views: 15597

Answers (5)

Mozilla rr open source replay based debugging

https://github.com/mozilla/rr

Hans mentioned replay based debugging, but there is a specific open source implementation that is worth mentioning: Mozilla rr.

First you do a record run, and then you can replay the exact same run as many times as you want, and observe it in GDB, and it preserves everything, including input / output and thread ordering.

The official website mentions:

rr's original motivation was to make debugging of intermittent failures easie

Furthermore, rr enables GDB reverse debugging commands such as reverse-next to go to the previous line, which makes it much easier to find the root cause of the problem.

Here is a minimal example of rr in action: How to go to the previous line in GDB?

Upvotes: 4

deb0ch
deb0ch

Reputation: 1202

An easy quick debug to find deadlocks is to have some global variables that you modify where you want to debug, and then print it in a signal handler. You can use SIGINT (sent when you interrupt with ctrl+c) or SIGTERM (sent when you kill the program):

int dbg;

int multithreaded_function()
{
  signal(SIGINT, dbg_sighandler);
  ...
  dbg = someVar;
  ...  
}

void  dbg_sighandler(int)
{
  std::cout << dbg1 << std::endl;
  std::exit(EXIT_FAILURE);
}

Like that you just see the state of all your debug variables when you interrupt the program with ctrl+c.

In addition you can run it in a shell while loop:

$> while [ $? -eq 0 ]
   do
   ./my_program
   done

which will run your program forever until it fails ($? is the exit status of your program and you exit with EXIT_FAILURE in your signal handler).

It worked well for me, especially for finding out how many thread passed before and after what locks.

It is quite rustic, but you do not need any extra tool and it is fast to implement.

Upvotes: 1

Maja Piechotka
Maja Piechotka

Reputation: 7216

There are 2 basic ways:

  1. Automate the running of program under debugger. Using gdb program -ex 'run <args>' -ex 'quit' should run the program under debugger and then quit. If the program is still alive in one form or another (segfault, or you broke it manually) you will be asked for confirmation.
  2. Attach the debugger after reproducing the deadlock. For example gdb can be run as gdb <program> <pid> to attach to running program - just wait for deadlock and attach then. This is especially useful when attached debugger causes timing to be changed and you can no longer repro the bug.

In this way you can just run it in loop and wait for result while you drink coffee. BTW - I find the second option easier.

Upvotes: 21

wilx
wilx

Reputation: 18228

You can run your test case under GDB in a loop using the command shown in https://stackoverflow.com/a/8657833/341065: gdb --eval-command=run --eval-command=quit --args ./a.out.

I have used this myself: (while gdb --eval-command=run --eval-command=quit --args ./thread_testU ; do echo . ; done).

Once it deadlocks and does not exit, you can just interrupt it by CTRL+C to enter into the debugger.

Upvotes: 2

Hans Kl&#252;nder
Hans Kl&#252;nder

Reputation: 2292

If this is some kind of homework - restarting again and again with more debug will be a reasonable approach.

If somebody pays money for every hour you wait, they might prefer to invest in a software that supports replay-based debugging, that is, a software that records everything a program does, every instruction, and allows you to replay it again and again, debugging back and forth. Thus instead of adding more debug, you record a session during which a deadlock happens, and then start debugging just before the deadlock happened. You can step back and forth as often as you want, until you finally found the culprit.

The software mentioned in the link actually supports Linux and multithreading.

Upvotes: 6

Related Questions