Quantum Monte Carlo
Quantum Monte Carlo

Reputation: 1

Slurm: Use cores from multiple nodes for Python parallelization

This question is somehow similar with this one, Slurm: Use cores from multiple nodes for R parallelization But it is for python.

I have a python program which can use multiple cores on a PC, it does not use MPI or OpenMP. It uses CreateProcess function to use a CPU core or a thread on the PC.

I wonder, for a python program like this, in SLURM, can I run it using cores from multiple nodes? I mean, it can assign the program on one node, and if the node in the SLURM cluster has 20 cores, then of course this program can run on this node just like it is running on a laptop with 20 cores. But, I want to use 100 cores, so it will require 5 nodes.Is it possible to use these 100 cores on the 5 nodes to run this python program?

I tried the below script, and submit the job to slurm

#!/bin/bash
#SBATCH -n 100                      # Number of CPU cores     
#SBATCH -t 0-24:00                  # wall time (D-HH:MM)      set to 24h to test
#SBATCH -p slgrid                    # partition
#SBATCH -o myoutput.%j.out          # STDOUT (%j = JobId)
#SBATCH -e myoutput.%j.err          # STDERR (%j = JobId)
#SBATCH --mail-type=ALL           # Send when job starts, stops, or fails

source /scratch/xxx/.venv/bin/activate     ! the python package is install in venv, and I activate it.
python -m darwin.run_search template.txt tokens.json options.json    ! The python run command

While SLURM successfully assigned 5 nodes and totally 100 cores to this job, I checked, the python program seems only running on the first one node with 20 cores. The rest 80 cores on the rest 4 nodes are not doing anything.

Again, I wonder, is that a way to let such a python program be able to use multiple cores from different nodes?

Or is there some python package or something can allow me achieve this goal?

Many thanks in advance!

I have tried different options in the sbatch script, but it seems this python program actually only run on just one node.

Upvotes: 0

Views: 43

Answers (1)

ciaron
ciaron

Reputation: 1169

mpi4py is the standard way of writing multi-node parallel code in Python. It will require that your rewrite your code though.

Upvotes: 1

Related Questions