Reputation: 11
I want to run a simple parallelized MPI python code on a HPC using multiple nodes.
SLURM is set as the job scheduler for the HPC. The HPC consists of 3 nodes with 36 cores each. Open MPI and MPICH are both available as MPI Implementations.
The code I want to run is as follows:
import sys
import numpy as np
import socket
import time
from mpi4py.futures import MPIPoolExecutor
# Define simple function
def myFun(x):
time.sleep(5)
print('Process is running on host: %s' % (socket.gethostname()))
return x+2
if __name__ == '__main__':
timestamp1 = time.perf_counter()
# Create small set of random input data
dat = [np.random.rand(3, 2) for x in range(8)]
# Using mpi4py for multiprocessing
with MPIPoolExecutor(max_workers=8) as pool:
# Run function with myFun and dat as map operation
result = pool.map(myFun, dat)
timestamp2 = time.perf_counter()
delta_t = timestamp2 - timestamp1
print('Runtime of code: ', delta_t)
This code is really simple and just used to understand how to get it working. The code based on a suggested answer from Hristo 'away' Iliev in this thread Python: how to parallelizing a simple loop with MPI with some minor changes. I really like this code as the use case i actually need to rewrite uses multiprocessing's pool class.
My default *.sbatch file is basically setup as follows:
#!/bin/bash
# SLURM Setup -------------------------------------------------
#SBATCH --job-name=Test_MPI
#SBATCH --output=job.%j.out
#SBATCH --error=job.%j.err
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=9
#SBATCH --mem=1G
module load ...
eval "$(conda shell.bash hook)"
conda activate ...
srun -n 1 python -m mpi4py.futures stackExample2.py
In this case I'm reserving 9 processors, since I use 1 Master process and 8 Worker processes (defined with MPIPoolExecutor(max_workers=8)). When just parallelizing it this works fine.
When I want to use this code with >36 processes I need to be able to run it on multiple nodes. But I couldn't get it right so far. First I only adjustes #SBATCH --nodes and --ntasks-per-node. In this case I set nodes = 2 and ntasks-per-node=5, as I need to make sure to have at least 9 processes reserved. When checking the *.err-file I got this:
srun: Warning: can't run 1 processes on 2 nodes, setting nnodes to 1
It ignores that --nodes =2 was set, and proceeds to perfrom as if --nodes=1 and --ntasks-per-node=9 was set. When looking at the *.out-file obviously only hostname1 was printed by the function.
I read a lot of posts, examples and introduction from various source. In most cases these used a simple srun command like "srun python myprogram.py". So I tried:
srun python -m mpi4py.futures stackExample2.py
This time my code was running on both hosts, but it was run 5 times on hostname1 and 5 times on hostname 2. Which was not intended as I want to run the code once using processors from host1 and host2.
I tried various other possiblities including using mpirun/mpiexec instead of srun, -host options and etc. but still couldn't get it right. This raised the question whether the code is missing something.
Do I need to change the code in order to use processes on multiple nodes? Or do I still use the wrong srun call?
Thank you in advance!
Upvotes: 0
Views: 3718
Reputation: 11
I found the solution for my problem.
First of all you can run this code once while utilising processors from multiple hosts.
As you can obviously tell I'm pretty new to writing code intended for parallel or distributed execution. My lack of experience led me the wrong way as I assumed everything is set up perfectly.
The Open MPI Implementation isn't working properly. My best guess is that something went wrong while setting up or compiling it. This was relevatively hard to realize, as it:
I saw a bunch of examples using a simple HelloWorld-code (mostly C and Python) displaying the genaral possiblity to run code distributed. But I couldn't recreate the results provided by these examples. I made sure to use the same code and mpirun/mpiexec or srun call, but instead returning something like:
Hello, I'm rank 0 of 3 running on host1
Hello, I'm rank 1 of 3 running on host2
Hello, I'm rank 2 of 3 running on host1
Hello, I'm rank 3 of 3 running on host2
My result was in the likes of:
Hello, I'm rank 0 of 3 running on host1
Hello, I'm rank 0 of 3 running on host2
Hello, I'm rank 0 of 3 running on host1
Hello, I'm rank 0 of 3 running on host2
This is when I realised that something is wrong with my implementation. I couldn't really pinpoint what the exact problem. I saw some posts that used the flag --mca to set up a prefeered line of communication, but couldn't really make anything out of it. As I mentioned in the initial question we have two MPI implementation (OpenMPI v3.1.3 (loaded as default), MPICH 3.3) we can load as modules.
I switched to MPICH and ran the exact same HelloWorld-code with mpiexec-call, which now was yielding the expected result. Subsequently I ran the code from the intial question, I was able to run it once using processors from multiple hosts. Even though I found a soultion, I'll contact the HPC administartor and try to figure out what is wrong with our Open MPI Implementation.
Upvotes: 1