Python: How to run simple MPI code on multiple nodes?

Question

I want to run a simple parallelized MPI python code on a HPC using multiple nodes.

SLURM is set as the job scheduler for the HPC. The HPC consists of 3 nodes with 36 cores each. Open MPI and MPICH are both available as MPI Implementations.

The code I want to run is as follows:

import sys
import numpy as np
import socket
import time
from mpi4py.futures import MPIPoolExecutor

# Define simple function
def myFun(x):
    time.sleep(5)
    print('Process is running on host: %s' % (socket.gethostname()))
    return x+2

if __name__ == '__main__':
    timestamp1 = time.perf_counter()
    # Create small set of random input data
    dat = [np.random.rand(3, 2) for x in range(8)]
    
    # Using mpi4py for multiprocessing
    with MPIPoolExecutor(max_workers=8) as pool:
        # Run function with myFun and dat as map operation
        result = pool.map(myFun, dat)
    
    timestamp2 = time.perf_counter()
    delta_t = timestamp2 - timestamp1
    
    print('Runtime of code: ', delta_t)

This code is really simple and just used to understand how to get it working. The code based on a suggested answer from Hristo 'away' Iliev in this thread Python: how to parallelizing a simple loop with MPI with some minor changes. I really like this code as the use case i actually need to rewrite uses multiprocessing's pool class.

My default *.sbatch file is basically setup as follows:

#!/bin/bash

# SLURM Setup -------------------------------------------------

#SBATCH --job-name=Test_MPI
#SBATCH --output=job.%j.out         
#SBATCH --error=job.%j.err
#SBATCH --nodes=1           
#SBATCH --ntasks-per-node=9     
#SBATCH --mem=1G            

module load ...
eval "$(conda shell.bash hook)"
conda activate ...

srun -n 1 python -m mpi4py.futures stackExample2.py

In this case I'm reserving 9 processors, since I use 1 Master process and 8 Worker processes (defined with MPIPoolExecutor(max_workers=8)). When just parallelizing it this works fine.

When I want to use this code with >36 processes I need to be able to run it on multiple nodes. But I couldn't get it right so far. First I only adjustes #SBATCH --nodes and --ntasks-per-node. In this case I set nodes = 2 and ntasks-per-node=5, as I need to make sure to have at least 9 processes reserved. When checking the *.err-file I got this:

srun: Warning: can't run 1 processes on 2 nodes, setting nnodes to 1

It ignores that --nodes =2 was set, and proceeds to perfrom as if --nodes=1 and --ntasks-per-node=9 was set. When looking at the *.out-file obviously only hostname1 was printed by the function.

I read a lot of posts, examples and introduction from various source. In most cases these used a simple srun command like "srun python myprogram.py". So I tried:

srun python -m mpi4py.futures stackExample2.py

This time my code was running on both hosts, but it was run 5 times on hostname1 and 5 times on hostname 2. Which was not intended as I want to run the code once using processors from host1 and host2.

I tried various other possiblities including using mpirun/mpiexec instead of srun, -host options and etc. but still couldn't get it right. This raised the question whether the code is missing something.

Do I need to change the code in order to use processes on multiple nodes? Or do I still use the wrong srun call?

Thank you in advance!

Python: How to run simple MPI code on multiple nodes?

Answers (1)

Related Questions