Reputation: 1768
I am working with a very basic python code (filename: test_mpi.py
) to try out parallel programming in python using mpi4py. What I am trying to do is to have a two dimensional numpy array with zeros for all entries. And then use specific processors in a cluster to increase the value of specific elements of the numpy array.
Specifically, I have a 3*3 numpy matrix (mat
) which has all elements as zeros. After my code finishes running (across multiple processors), I want the matrix to look like this:
mat = [[ 1. 2. 3.]
[ 4. 5. 6.]
[ 7. 8. 9.]]
This is a fairly simple task and I expect my code to finish running inside a few minutes(if not lesser time). My code keeps running for a very long time and doesn't stop execution (ultimately I have to delete the job after many hours.)
This is my code:
from __future__ import division
from mpi4py import MPI
import os
import time
import numpy as np
comm = MPI.COMM_WORLD
nproc = comm.Get_size()
rank = comm.Get_rank()
start_time = time.time()
mat = np.zeros((3,3))
comm.bcast([ mat , MPI.DOUBLE], root=0)
for proc in range(1, nproc):
if rank == proc:
print "I'm processor: ", rank
var = proc
comm.send( var, dest=0, tag = (proc*1000) )
print "Processor: ", rank, " finished working."
if rank == 0:
print "Hello! I'm the master processor, rank: ", rank
for i in range(0,dim):
for j in range(0, dim):
proc = ((i*j)+1)
mat[i,j] += comm.recv(source=proc, tag=(proc*1000) )
np.savetxt('mat.txt', mat)
print time.time() - start_time
This is my job script for execution of this python code:
#!/bin/sh
#PBS -l nodes=2:ppn=16
#PBS -N test_mpi4py
#PBS -m abe
#PBS -l walltime=168:00:00
#PBS -j eo
#PBS -q physics
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=16
export I_MPI_PIN=off
echo 'This job started on: ' `date`
/opt/intel/impi/2018.0.128/intel64/bin/mpirun -np 32 python test_mpi.py
I use qsub jobscriptname.sh
to run the job script. What am I missing here? I will appreciate any help here.
Upvotes: 1
Views: 2537
Reputation: 17159
Your code did not finish because some of the MPI communications did not complete.
MPI requires that for every send there should be exactly one receive.
Your first loop is executed by every MPI process rank independently, the condition rank == proc
would be satisfied exactly once for each rank with the exception of 0
rank, therefore comm.send
would be executed nproc - 1
times. Your second loop is executed dim * dim
times. Therefore comm.recv
would also be executed dim*dim
times. Unless nproc - 1 == dim * dim
. The requirement would not be satisfied and some recv
or send
operations would be waiting to complete indefinitely. For your example 31 != 9
, so the communications would not complete until the walltime is exceeded.
In order to fix this error let us clarify the algorithm a bit. So we want to have each of the ranks from 1 to 9 to be responsible for one of the elements in a 3x3 matrix. Each process rank posts comm.send
request. The requests are received in a certain order by process rank 0 and stored in the corresponding element of the matrix. The rest of the ranks if they are available do nothing.
Let us introduce three changes:
dim
mat[i,j]
which currently is not correct (e.g. for the central element mat[1,1]
the rank should be 5, not 1 * 1 + 1 = 2)Here is what I got after the modifications:
from __future__ import division
from mpi4py import MPI
import os
import time
import numpy as np
comm = MPI.COMM_WORLD
nproc = comm.Get_size()
rank = comm.Get_rank()
start_time = time.time()
dim = 3
mat = np.zeros((dim,dim))
comm.bcast([ mat , MPI.DOUBLE], root=0)
if rank > 0:
if rank <= dim * dim:
print "I'm processor: ", rank
var = rank
req = comm.send( var, dest=0, tag = (rank*1000) )
print "Processor: ", rank, " finished working."
else:
print "Hello! I'm the master processor, rank: ", rank
for i in range(0,dim):
for j in range(0, dim):
proc = ((i*dim)+j)+1
if proc < nproc:
mat[i,j] += comm.recv(source=proc, tag=(proc*1000) )
np.savetxt('mat.txt', mat)
And here is the output:
mpirun -np 5 python mpi4.py
saves to mat.txt
the following matrix
1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00
4.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00
0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00
And
mpirun -np 32 python mpi4.py
saves to mat.txt
the following matrix
1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00
4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00
7.000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00
While 10 is the minimal number of process ranks that would produce the correct result.
Upvotes: 5