Reputation: 1034
I have an mpi4py
program that hangs intermittently. How can I trace what the individual processes are doing?
I can run the program in different terminals, for example using pdb
mpiexec -n 4 xterm -e "python -m pdb my_program.py"
But this gets cumbersome if the issue only manifests with a large number of processes (~80 in my case). In addition, it's easy to catch exceptions with pdb
but I'd need to see the trace to figure out where the hang occurs.
Upvotes: 8
Views: 2661
Reputation: 1034
The Python trace module allows you to trace program execution. In order to store the trace of each process separately, you need to wrap your code in a function:
def my_program(*args, **kwargs):
# insert your code here
pass
And then run it with trace.Trace.runfunc
:
import sys
import trace
# define Trace object: trace line numbers at runtime, exclude some modules
tracer = trace.Trace(
ignoredirs=[sys.prefix, sys.exec_prefix],
ignoremods=[
'inspect', 'contextlib', '_bootstrap',
'_weakrefset', 'abc', 'posixpath', 'genericpath', 'textwrap'
],
trace=1,
count=0)
# by default trace goes to stdout
# redirect to a different file for each processes
sys.stdout = open('trace_{:04d}.txt'.format(COMM_WORLD.rank), 'w')
tracer.runfunc(my_program)
Now the trace of each process will be written in a separate file trace_0001.txt
etc. Use ignoredirs
and ignoremods
arguments to omit low level calls.
Upvotes: 3