Reputation: 33
I'm running a project using Boost MPI (1.55) over Open MPI (1.6.1) on a compute cluster.
Our cluster has nodes that have 64 CPUs and we spawn a single MPI process on each. Most of our communication is between individual processes, each having a series of irecv() requests open (for different tags) and sends are carried out blocking using send().
The problem we're getting is that after a short time of processing (usually under 10 minutes), we're getting this error that is causing the program to end:
[btl_tcp_component.c:1114:mca_btl_tcp_component_accept_handler] accept() failed: Too many open files in system (23).
Closer debugging shows that it's network sockets that are taking up these file handles, and we're hitting our OS limit of 65536 file handles open. Most of these are in the status "TIME_WAIT", which is apparently what TCP does for (usually) 60 seconds after a socket is closed (in order to catch any late packets) . I was under the impression that Open MPI didn't close sockets(http://www.open-mpi.org/faq/?category=tcp#tcp-socket-closing) and just kept up to N^2 sockets open so that all processes could talk to each other. Obviously 65536 is way beyond 64^2 (the most common cause of this error involving MPI is simply that the file limit is less than N^2) and most of those were sockets that were in a recently closed status.
Our C++ code is too large to fit here, but I've written a simplified version of some of it to at least show our implementation and see if there are any issues with our technique. Is there something in our usage of MPI that would be causing OpenMPI to close and reopen too many sockets?
namespace mpi = boost::mpi;
mpi::communicator world;
bool poll(ourDataType data, mpi::request & dataReq, ourDataType2 work, mpi::request workReq) {
if(dataReq.test()) {
processData(data); // do a bunch of work
dataReq = world.irecv(mpi::any_source, DATATAG, data);
return true;
}
if(workReq.test()) {
int target = assess(work);
world.send(target, DATATAG, dowork);
world.irecv(mpi::any_source, WORKTAG, data);
return true;
}
return false;
}
bool receiveFinish(mpi::request finishReq) {
if (finishReq.test()) {
world.send(0, RESULTS, results);
resetSelf();
finishReq = world.irecv(0, FINISH);
return true;
}
return false;
}
void run() {
ourDataType data;
mpi::request dataReq = world.irecv(mpi::any_source, DATATAG, data);
ourDataType2 work;
mpi::request workReq = world.irecv(mpi::any_source, WORKTAG, work);
mpi::request finishReq = world.irecv(0, FINISH); // the root process can call a halt
while(!receiveFinish(finishReq)) {
bool doWeContinue = poll(data, dataReq);
if(doWeContinue) {
continue;
}
// otherwise we do other work
results = otherwork();
world.send(0, RESULTS, results);
}
}
Upvotes: 3
Views: 878
Reputation: 74475
This might not be the real reason for Open MPI opening so many sockets, but you appear to be leaking requests in the following part of poll()
function:
if(workReq.test()) {
int target = assess(work);
world.send(target, DATATAG, dowork);
world.irecv(mpi::any_source, WORKTAG, data); // <-------
return true;
}
The request handle as returned by world.irecv()
is never saved and thus lost. If called periodically on the same workReq
object, this branch will execute each the time after the completion of the request since testing an already completed request always returns true
. Therefore you will start lots of non-blocking receives that will never be waited on or tested. Not to mention the messages sent.
Similar problem exists in receiveFinish
- finishReq
is being passed by value and the assignment won't affect the value in run()
.
A side note: is this really the code you use? It appears that the poll()
function you call in run()
takes two arguments while the one show here takes four arguments and there are no arguments with default values.
Upvotes: 1