Reputation: 33
I'm trying use OpenMPI and Slurm to run a simple hello world program. The goal is to use an #SBATCH script like below.
In the bashrc I added %PATH
and %LD_LIBRARY_PATH
and they both contain /shared/centos7/openmpi/3.1.2/bin
When I run the SBATCH script with srun ~/hello-mpi.x
it's output is what I'd expect:
Hello World from process 15 from the Node c0625. There are a total of 32 processes.
Hello World from process 15 from the Node c0626. There are a total of 32 processes.
Two nodes 625 and 626, 32 processes like I have in the SBATCH script below.
When I run the SBATCH script with mpirun ~/hello-mpi.x
I get this error:
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
And if I run the the SBATCH script with srun mpirun ~/hello-mpi.x
I get this error slurmstepd: error: execve(): mpirun: No such file or directory
.
This is the SBATCH script.
#!/bin/bash
#SBATCH --verbose
#SBATCH --export=ALL
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=1
#SBATCH --time=00:10:00
#SBATCH --job-name=JonsJob
#SBATCH --mem=100G
#SBATCH --partition=short
srun ~/hello-mpi.x
On the command line, if I run mpirun or srun I get output from a single node (i did not use salloc to request another node.) Hello World from process 2 from the Node c0170. There are a total of 4 processes.
But, if I use srun mpirun ~/hello-mpi.x
I get en error slurmstepd: error: execve(): mpirun: No such file or directory
This is the code hello world code is below along with my.bashrc.
/* The Parallel Hello World Program */
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
int rank, size, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Get_processor_name(processor_name, &namelen);
printf("Hello World from process %d from the Node %s. There are a total of %d processes.\n", rank, processor_name, size);
MPI_Finalize();
return 0;
}
Also, I have environmental variables $PATH and $LD_LIBRARY_PATH in my .bashrc file
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
module load openmpi
module load cuda/9.2
Upvotes: 1
Views: 1319
Reputation: 2099
See the SLURM FAQs below
Basically, .bashrc isnt loaded by slurm.
Either call the .bashrc file from your sbatch script of add your "module load" commands within your sbatch script.
Upvotes: 0