Reputation: 100
my question is not related to any specific code. Rather, it is general. I am currently running an MPI parallellized code in Python and I encounter segmentation faults occasionally. Whenever the segfault occurs, I get an error message (like the one below) and the code exits -
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 173577 RUNNING AT whatever_node
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
My question is this - how do I pinpoint the rank which caused this error? The exit message only mentions the PID. Can I use this to figure out the erroneous rank?
PS : The aforementioned code is not my own - I only run it and report errors if any.
Upvotes: 1
Views: 85
Reputation: 994
Maybe you can add a simple print like this in your code:
import os
print("Rank %d on %s, Process PID for worker = %d" %(MPI.COMM_WORLD.Get_rank(),MPI.Get_processor_name(),os.getpid()))
Upvotes: 1