Is there an efficient way to debug an MPI master-worker model without parallel debugger?

Question

Hereby I have a master-worker model, where the one and only master start a MPI_Comm workercomm and all the workers are spawned by MPI_Comm_spawn in it. Of course in the first run some bugs exists and debug tools are needed.

So to gdb master isn't hard, yet if I want to debug worker it's another story. MPICH guide FAQ: Debugging applications in parallel tells me gdb CAN be used with MPI, yet if it is spawned by another process, I have to make a infinite loop and wait gdb to attach to it, where I have to do this 10+ times each run. Yet all I have is gdb, and those parallel debuggers aren't available to me. Is there anything that makes this less time-consuming?

Joachim · Accepted Answer

A way to debug an MPI program with gdb is to spawn a terminal emulator per process that executes the application under control of gdb as suggested by OpenMPI documentation (https://docs.open-mpi.org/en/v5.0.x/app-debug/serial-debug.html#use-mpirun-to-launch-separate-instances-of-serial-debuggers). It helps to prepare a list of commands (in my example in ~/gdb.cmd) to be passed to each gdb instance.

mpirun -np 4 /usr/bin/xterm -e 'gdb -x ~/gdb.cmd --args ./a.out 10'

My command file contains typically the following commands to avoid unnecessary interaction with all of the terminals before even getting started:

set breakpoint pending on
set pagination off
set debuginfod enabled off
break foo
run

For your use case with spawning additional processes using MPI_Comm_spawn you would replace launching your application with launching the xterm that starts the application in gdb. In your case, you could even use different gdb command files for the different calls to MPI_Comm_spawn.

Is there an efficient way to debug an MPI master-worker model without parallel debugger?

Answers (1)

Related Questions