Reputation: 361
Almost all MPI routines return an error handler. However, a communication error usually crashes the program at the spot where the MPI routine is called and makes the error handler useless. Is there a way to catch the error in such a case? Or alternatively, how to prevent the program from crashing when a catastrophic error happens so that we can catch the error?
Upvotes: 0
Views: 1906
Reputation: 9519
The behavior of MPI functions upon error has slightly changed with the latest standards. It used to be managed with the MPI_Errhandler_{get|set|create}()
functions (deprecated since MPI 2.0 and removed since MPI 3.0).
It is now managed through the MPI_{Comm|Win|File}_{get|set|create}_errhandler()
functions. This gives much greater level of possible adjustments in this management.
There are two predefined error handlers that all MPI libraries propose (although some more can be proposed as well):
MPI_ERRORS_ARE_FATAL
which aborts the entire program whenever an error occurs within an associated MPI call; andMPI_ERRORS_RETURN
which simply returns from the associated MPI call upon error, with the corresponding error code.By default, the behavior is that all MPI calls but the ones associated with Input/Output actions trigger abortion in case of error. Conversely, the MPI-IO calls will normally return from error with the corresponding error code. Actually, the standard is a bit less prescriptive and says:
By default, communication errors are fatal --
MPI_ERRORS_ARE_FATAL
is the default error handler associated withMPI_COMM_WORLD
. I/O errors are usually less catastrophic (e.g., "file not found") than communication errors, and common practice is to catch these errors and continue executing.
So to answer plainly to your questions, if you want to prevent the code from crashing upon error, catch them and implement some contingency procedure, you have mostly two solutions:
MPI_ERRORS_RETURN
for the communicator, file or window you want and check the error code upon completion of the associated MPI calls. You will then have to take action based on the exact error returned each time, bearing in mind that once an error occurred inside a MPI call, there is no guanranty that any further MPI call will succeed. Indeed, there are all chances that any subsequent call to MPI will crash.But again, the fact that no MPI call is guaranteed to succeed after a first error was encountered within the library greatly limits the scope of what can be done, so most of the time, the default behavior is perfectly suited and can be kept untouched.
Upvotes: 4