Reputation: 752
I am running a MPI application on a cluster, using 4 nodes each with 64 cores. The application performs an all to all communication pattern.
Executing the application by the following runs fine:
$: mpirun -npernode 36 ./Application
Adding a further process per node let the application crash:
$: mpirun -npernode 37 ./Application
--------------------------------------------------------------------------
A process failed to create a queue pair. This usually means either
the device has run out of queue pairs (too many connections) or
there are insufficient resources available to allocate a queue pair
(out of memory). The latter can happen if either 1) insufficient
memory is available, or 2) no more physical memory can be registered
with the device.
For more information on memory registration see the Open MPI FAQs at:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
Local host: laser045
Local device: qib0
Queue pair type: Reliable connected (RC)
--------------------------------------------------------------------------
[laser045:15359] *** An error occurred in MPI_Issend
[laser045:15359] *** on communicator MPI_COMM_WORLD
[laser045:15359] *** MPI_ERR_OTHER: known error not in list
[laser045:15359] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[laser040:49950] [[53382,0],0]->[[53382,1],30] mca_oob_tcp_msg_send_handler: writev failed: Connection reset by peer (104) [sd = 163]
[laser040:49950] [[53382,0],0]->[[53382,1],21] mca_oob_tcp_msg_send_handler: writev failed: Connection reset by peer (104) [sd = 154]
--------------------------------------------------------------------------
mpirun has exited due to process rank 128 with PID 15358 on
node laser045 exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[laser040:49950] 4 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / ibv_create_qp failed
[laser040:49950] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[laser040:49950] 4 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
EDIT added some source code of all to all communication pattern:
// Send data to all other ranks
for(unsigned i = 0; i < (unsigned)size; ++i){
if((unsigned)rank == i){
continue;
}
MPI_Request request;
MPI_Issend(&data, dataSize, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &request);
requests.push_back(request);
}
// Recv data from all other ranks
for(unsigned i = 0; i < (unsigned)size; ++i){
if((unsigned)rank == i){
continue;
}
MPI_Status status;
MPI_Recv(&recvData, recvDataSize, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &status);
}
// Finish communication operations
for(MPI_Request &r: requests){
MPI_Status status;
MPI_Wait(&r, &status);
}
Is there something I can do as cluster user or some advices I can give the cluster admin ?
Upvotes: 2
Views: 2035
Reputation: 51
The line mca_oob_tcp_msg_send_handler error line may indicate that the node corresponding to a receiving rank died (ran out of memory or received a SIGSEGV):
http://www.open-mpi.org/faq/?category=tcp#tcp-connection-errors
The OOB (out-of-band) framework in Open-MPI is used for control messages, not for the messages of your applications. Indeed, messages typically go throught byte transfer layers (BTLs) such as self, sm, vader, openib (Infiniband), and so on.
The output of 'ompi_info -a' is useful in that regard.
Finally, it is not specified in the question is the Infiniband hardware vendor is Mellanox, so the XRC option may not work (for instance, Intel/QLogic Infiniband does not support this option).
Upvotes: 3
Reputation: 752
The error is connected to the buffer size of the mpi message queues commented here:
http://www.open-mpi.org/faq/?category=openfabrics#ib-xrc
The following environment setting solved my problem:
$ export OMPI_MCA_btl_openib_receive_queues="P,128,256,192,128:S,65536,256,192,128"
Upvotes: 2