Nitin Labhishetty
Nitin Labhishetty

Reputation: 1348

MPI programs hanging up

I installed mpich2 on my Ubuntu 14.04 laptop with the following command:

sudo apt-get install libcr-dev mpich2 mpich2-doc

This is the code I'm trying to execute:

#include <mpi.h>
#include <stdio.h>

int main()
{
    int myrank, size;
    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    printf("Hello world! I am %d of %d\n", myrank, size);

    MPI_Finalize();
    return 0;
}

Compiling it as mpicc helloworld.c gives no errors. But when I execute the program as: mpirun -np 5 ./a.out There is no output, the program just keeps executing as if it were in an infinite loop. On pressing Ctrl+C, this is what I get:

$ mpirun -np 5 ./a.out                                                                                                                                                        
^C[mpiexec@user] Sending Ctrl-C to processes as requested
[mpiexec@user] Press Ctrl-C again to force abort
[mpiexec@user] HYDU_sock_write (./utils/sock/sock.c:291): write error (Bad file descriptor)
[mpiexec@user] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:170): unable to write data to proxy
[mpiexec@user] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:79): unable to send signal downstream
[mpiexec@user] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@user] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec@user] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion

I couldn't get any solution on googling. What is causing this error?

Upvotes: 3

Views: 6621

Answers (1)

jyvet
jyvet

Reputation: 2201

I was getting the same issue with two compute nodes:

$ mpirun -np 10 -ppn 5 --hosts c1,c2 ./a.out  
[mpiexec@c1] Press Ctrl-C again to force abort
[mpiexec@c1] HYDU_sock_write (utils/sock/sock.c:286): write error (Bad file descriptor)
[mpiexec@c1] HYD_pmcd_pmiserv_send_signal (pm/pmiserv/pmiserv_cb.c:169): unable to write data to proxy
[mpiexec@c1] ui_cmd_cb (pm/pmiserv/pmiserv_pmci.c:79): unable to send signal downstream
[mpiexec@c1] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@c1] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec@c1] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion

Turns out c1 node couldn't ssh c2.

If you are using only single machine, you can try using fork as launcher:

mpirun -launcher fork -np 5 ./a.out

Upvotes: 1

Related Questions