dx_mrt
dx_mrt

Reputation: 717

error on running mpi job

I'm trying to run a MPI job on a cluster with torque and openmpi 1.3.2 installed and I'm always getting the following error:

"mpirun was unable to launch the specified application as it could not find an executable: Executable: -p Node: compute-101-10.local while attempting to start process rank 0."

I'm using the following script to do the qsub:

#PBS -N mphello
#PBS -l walltime=0:00:30
#PBS -l nodes=compute-101-10+compute-101-15
cd $PBS_O_WORKDIR
mpirun -npersocket 1 -H compute-101-10,compute-101-15 /home/username/mpi_teste/mphello

Any idea why this happens? What I want is to run 1 process in each node (compute-101-10 and compute-101-15). What am I getting wrong here? I've already tried several combinations of the mpirun command, but either the program runs on only one node or it gives me the above error...

Thanks in advance!

Upvotes: 0

Views: 3537

Answers (2)

dx_mrt
dx_mrt

Reputation: 717

The problem is that the -npersocket flag is only supported by Open MPI 1.3.2 and the cluster where I'm running my code only has Open MPI 1.2 which doesn't support that flag.

A possible way around is to use the flag -loadbalance and specify the nodes where i want the code to run with the flag -H node1,node2,node3,... like this:

mpirun -loadbalance -H node1,node2,...,nodep -np number_of_processes program_name

that way each node will run number_of_processes/p processes, where p the number of nodes where the processes will be run.

Upvotes: 0

Dima Chubarov
Dima Chubarov

Reputation: 17179

The -npersocket option did not exist in OpenMPI 1.2.

The diagnostics that OpenMPI reported

mpirun was unable to launch the specified application as it could not find an executable: Executable: -p is exactly what mpirun in OpenMPI 1.2 would say if called with this option.

Running mpirun --version will determine which version of OpenMPI is default on the compute nodes.

Upvotes: 1

Related Questions