Reputation: 5015
I need to write a MPI program which has to just start few processes on different cluster nodes. This is my sample code.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int rank, size, nodenamesize;
char nodename[100];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Get_processor_name(nodename, &nodenamesize);
printf("Hello world! I am %d of %d running on %s\n", rank, size, nodename);
if (rank == 0) {
system("./Longwait&");
} else if (rank == 1) {
system("./AnotherLongWait&");
}
MPI_Finalize();
return 0;
}
It successfully start the processes but MPI application doesn't terminate itself. It waits even after the MPI_Finalize() is called;.
what is the wrong with this code? What do I need to do to have the MPI program has to just start some other applications but shouldn't wait for anything.
Thank you, Regards, Robo.
Upvotes: 1
Views: 1804
Reputation: 74385
The cause of the delay is the mechanism that Open MPI uses in order to provide I/O redirection. Tip: use system("ls -l /proc/self/fd");
or system("lsof -c lsof");
to get an idea of how many file descriptors are open in child processes spawned by system(3)
. These descriptors are held open by both Longwait
and AnotherLongWait
which makes the MPI run-time wait for them to complete.
Here is a simple example with two very simple sample versions of Longwait
:
Version 1: Sleeps 1 minute
#include <unistd.h>
int main (void)
{
sleep(60);
return 0;
}
If you spawn this program with system("./Longwait&");
you will have to wait for it to first finish before mpirun
/mpiexec
would also finish.
Version 2: Blindly closes the first 20 file descriptors before sleeping
#include <unistd.h>
int main (void)
{
int i;
for (i = 0; i < 20; i++)
close(i);
sleep(60);
return 0;
}
If you spawn this program as before, the mpirun
/mpiexec
executable will finish shortly after the MPI program exits without waiting.
Now this is not a real solution - randomly closing open file descriptors can have unpredictable effects. Finding out which descriptors should be closed is neither easy nor portable. I would generally advise against doing what you do in your code. Besides Open MPI does not reliably support process forking on systems with InfiniBand interconnect (system(3)
uses fork(2)
behind the scenes).
Upvotes: 2