Reputation: 261
I am using Srun command to submit a computational job onto the Linux but the output data was duplicated. Here is the shell script for job submission.
#!/bin/bash
#SBATCH --partition=short
#SBATCH --job-name="vasp"
#SBATCH --nodes=2
#SBATCH --time=24:00:00
#SBATCH --constraint=ib
#SBATCH --exclusive
#SBATCH --err=std.err
#SBATCH --output=std.out
#----------------------------------------------------------#
export OMP_NUM_THREADS=1
#----------------------------------------------------------#
echo "The job "${SLURM_JOB_ID}" is running on "${SLURM_JOB_NODELIST}
#----------------------------------------------------------#
source /shared/centos7/intel/oneapi/2021.1_u9-base/setvars.sh
srun --ntasks=40 --hint=nomultithread --ntasks-per-node=20 --ntasks-per-socket=2 --ntasks-per-core=1 --mem-bind=v,local /work/bin/v_c
Here is the duplicated output data.
:: oneAPI environment initialized ::
MPI startup(): PMI server not found. Please set I_MPI_PMI_LIBRARY variable if it is not a singleton case.
MPI startup(): PMI server not found. Please set I_MPI_PMI_LIBRARY variable if it is not a singleton case.
MPI startup(): PMI server not found. Please set I_MPI_PMI_LIBRARY variable if it is not a singleton case.
...
N E dE d eps ncg rms rms(c)
N E dE d eps ncg rms rms(c)
DAV: 1 0.980384438844E+03 0.98038E+03 -0.43531E+04 6372 0.144E+03
DAV: 1 0.980384438844E+03 0.98038E+03 -0.43531E+04 6372 0.144E+03
DAV: 1 0.980384438844E+03 0.98038E+03 -0.43531E+04 6372 0.144E+03
...
DAV: 55 -0.911176657386E+02 -0.16384E-05 -0.23427E-04 6760 0.627E-02 0.587E-03
DAV: 54 -0.911176641002E+02 -0.12570E-05 -0.43068E-04 6600 0.795E-02 0.559E-03
DAV: 55 -0.911176657386E+02 -0.16384E-05 -0.23427E-04 6760 0.627E-02 0.587E-03
DAV: 56 -0.911176678701E+02 -0.21315E-05 -0.36418E-04 6648 0.730E-02 0.762E-03
DAV: 54 -0.911176641002E+02 -0.12570E-05 -0.43068E-04 6600 0.795E-02 0.559E-03
DAV: 54 -0.911176641002E+02 -0.12570E-05 -0.43068E-04 6600 0.795E-02 0.559E-03
DAV: 55 -0.911176657386E+02 -0.16384E-05 -0.23427E-04 6760 0.627E-02 0.587E-03
There should be only output line like the followings.
N E dE d eps ncg rms rms(c)
DAV: 1 0.980384438844E+03 0.98038E+03 -0.43531E+04 6372 0.144E+03
...
DAV: 54 -0.911176641002E+02 -0.12570E-05 -0.43068E-04 6600 0.795E-02 0.559E-03
DAV: 55 -0.911176657386E+02 -0.16384E-05 -0.23427E-04 6760 0.627E-02 0.587E-03
DAV: 56 -0.911176678701E+02 -0.21315E-05 -0.36418E-04 6648 0.730E-02 0.762E-03
Would anyone please help me modify my shell script file to sort out this problem?
Many thanks.
Upvotes: 0
Views: 577
Reputation: 37228
Most likely you're not using the MPI version of VASP, so instead it starts two instances of the serial version on the two nodes you have allocated.
As an aside, --ntasks-per-node=20 --ntasks-per-socket=2
looks nonsensical unless you really have nodes with 10 sockets.
Upvotes: 1