Quasar
Quasar

Reputation: 100

mpi4py pinpointing the erroneous rank during segmentation fault

my question is not related to any specific code. Rather, it is general. I am currently running an MPI parallellized code in Python and I encounter segmentation faults occasionally. Whenever the segfault occurs, I get an error message (like the one below) and the code exits -

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 173577 RUNNING AT whatever_node
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

My question is this - how do I pinpoint the rank which caused this error? The exit message only mentions the PID. Can I use this to figure out the erroneous rank?

PS : The aforementioned code is not my own - I only run it and report errors if any.

Upvotes: 1

Views: 85

Answers (1)

Jose Manuel de Frutos
Jose Manuel de Frutos

Reputation: 994

Maybe you can add a simple print like this in your code:

import os
print("Rank %d on %s, Process PID for worker = %d" %(MPI.COMM_WORLD.Get_rank(),MPI.Get_processor_name(),os.getpid()))

Upvotes: 1

Related Questions