Atul Gupta
Atul Gupta

Reputation: 217

Running openmp on cluster

I have to run an openmp program on a cluster with different configuration (such as different number of nodes). But the problem I am facing is that whenever I am trying to run the program with say 2 nodes then the same piece of program runs 2 times instead of running in parallel.

My program -

gettimeofday(&t0, NULL);
for (k=0; k<size; k++) {
    #pragma omp parallel for shared(A)
    for (i=k+1; i<size; i++) {
        //parallel code
    }
    #pragma omp barrier
    for (i=k+1; i<size; i++) {
        #pragma omp parallel for
        //parallel code
    }
}

gettimeofday(&t1, NULL);
printf("Did %u calls in %.2g seconds\n", i, t1.tv_sec - t0.tv_sec + 1E-6 * (t1.tv_usec - t0.tv_usec));

It is an LU decomposition program. When I am running it on 2 node then I am getting output something like this -
Did 1000 calls in 5.2 seconds
Did 1000 calls in 5.3 seconds
Did 2000 calls in 41 seconds
Did 2000 calls in 41 seconds

As you see each the program is run two times for each value (1000,2000,3000...) instead of running in parallel. It is my homework program but I am stuck at this point.

I am using SLURM script to run this program on my college computing cluster. This is the standard script provided by the professor.

#!/bin/sh
##SBATCH --partition=general-compute
#SBATCH --time=60:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
##SBATCH --mem=24000
# Memory per node specification is in MB. It is optional. 
# The default limit is 3GB per core.
#SBATCH --job-name="lu_openmpnew2nodes"
#SBATCH --output=luopenmpnew1node2task.out
#SBATCH --mail-user=***@***.edu
#SBATCH --mail-type=ALL
##SBATCH --requeue
#Specifies that the job will be requeued after a node failure.
#The default is that the job will not be requeued.


echo "SLURM_JOBID="$SLURM_JOBID
echo "SLURM_JOB_NODELIST"=$SLURM_JOB_NODELIST
echo "SLURM_NNODES"=$SLURM_NNODES
echo "SLURMTMPDIR="$SLURMTMPDIR

cd $SLURM_SUBMIT_DIR
echo "working directory = "$SLURM_SUBMIT_DIR

module list
ulimit -s unlimited
#

echo "Launch luopenmp with srun"
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
for i in {1000..20000..1000}
do
srun ./openmpNew "$i"
done

#
echo "All Done!"

Upvotes: 4

Views: 5560

Answers (1)

Alexander Vogt
Alexander Vogt

Reputation: 18098

Be careful, you are confusing MPI and OpenMP here.

OpenMP works with Threads, i.e. on shared memory which do not communicate over several nodes of a distributed memory system (there exist some techniques to do so, but they are not performant enough).

What you are doing is starting the same program on two nodes each. If you where using MPI, this would be fine. But in your case you start two processes with a default number of threads. Those two processes are independent of each other.

I would suggest some further studies on the topics of Shared Memory Parallelization programming (like OpenMP) and Distributed Memory Parallelization (like MPI). There's tons of tutorials out there, and I would recommend the book "Introduction to High Performance Computing for Scientists and Engineers," by Hager and Wellein.

To try your program, start on one node, and specify OMP_NUM_THREADS like:

OMP_NUM_THREADS=1 ./openmpNew "$i"
OMP_NUM_THREADS=2 ./openmpNew "$i"
...

Here is an example script for SLURM: link.

Upvotes: 10

Related Questions