user3711746
user3711746

Reputation: 13

Why mpirun freeze in loop

Here is my script and python code.

$ cat go

while true
do
echo "------->"
python3 -m mpi4py ./go.py
echo "<------"
done

This code run python go.py in loop.

$ cat go.py

import mpi4py.MPI as MPI

print( "######", MPI.Is_initialized())

comm = MPI.COMM_WORLD
comm_rank = comm.Get_rank()
comm_size = comm.Get_size()

# point to point communication
data_send = [comm_rank]*5
comm.send(data_send,dest=(comm_rank+1)%comm_size)
data_recv =comm.recv(source=(comm_rank-1)%comm_size)
print("my rank is %d, and Ireceived:" % comm_rank)
print( data_recv )

MPI.Finalize()

print( "######", MPI.Is_finalized())

This python code just print.

After I run this go script, the go.py execute and exit, when go.py execute again, it got stuck.

$ mpirun --mca orte_base_help_aggregate 0 -np 2 sh ./go

------->
------->
--------------------------------------------------------------------------
[[27909,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: myvm20

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[[27909,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: myvm20

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
###### True
###### True
my rank is 0, and Ireceived:
[1, 1, 1, 1, 1]
my rank is 1, and Ireceived:
[0, 0, 0, 0, 0]
###### True
###### True
<------
------->
<------
------->
--------------------------------------------------------------------------
[[27909,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: myvm20

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[[27909,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: myvm20

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------

and freeze for ever.

Why does it stuck, and how can I continue this script?

BTW: I have two kind of job A/B to run, job A persist, job B finish and quit. So I can't run it as following:

while true
do
  echo "------->"
  mpirun -np 2 A : -np 2 B
  echo "<------"
done

Is there other way to do this?

Upvotes: 1

Views: 540

Answers (1)

Gilles Gouaillardet
Gilles Gouaillardet

Reputation: 8395

long story short, you cannot do that.

here is what you should do instead

while true
do
  echo "------->"
  mpirun --mca orte_base_help_aggregate 0 -np 2 python3 -m mpi4py ./go.py
  echo "<------"
done

Upvotes: 1

Related Questions