Reputation: 1
I am new to GNU Parallel and am trying to run a few simulations. I have a bash script which I am submitting to a cluster via SLURM. The script is given below. Essentially, the parallel calls a function run_simulation, which will call bash scripts inside it. The bash scripts generate output in the current directory, which is different for each job.
#!/bin/bash
# Job name:
#SBATCH --job-name=Run_MD_Sim
#
# Account:
#SBATCH --account=fc_mllam
#
# Partition:
#SBATCH --partition=savio3
#
# Request one node:
#SBATCH --nodes=1
#
# Specify number of tasks for use case (example):
#
#
# Processors per task:
#SBATCH --cpus-per-task=2
#
# Wall clock limit:
#SBATCH --time=5:00:30
#
## Command(s) to run (example):
module load intel
module load openmpi
module load gcc
module load cmake
module load gnu-parallel/2019.03.22
energy_list=("90")
fluence_list=("1000")
len_energy=${#energy_list[@]}
len_fluence=${#fluence_list[@]}
# Change this line if number of nodes requested is changed
val="ALE_Cycle_Run_2.sh"
# Function to run MD simulation for a single combination of energy and fluence
run_simulation() {
enval="$1"
flval="$2"
counter="$3"
val="$4"
# Create a directory to carry out computations. If node=1, then we are in fcmd_bondorder
mkdir "Temp_Directory_$counter"
# Check if using more than one node. If more than one node is used, then working directory will be the home directory. Below lines will change
cp ../temp_000588-322.cfg "Temp_Directory_$counter/temp_000000-000.cfg"
# Copy simulation files into this folder
cp *.o "Temp_Directory_$counter/"
cp *.cpp "Temp_Directory_$counter/"
cp *.h "Temp_Directory_$counter/"
cp Makefile "Temp_Directory_$counter/"
cp "$val" "Temp_Directory_$counter/"
cp Bond_Param_Gen.sh "Temp_Directory_$counter/Bond_Param_Gen.sh"
# Change directory to temporary directory
cd "Temp_Directory_$counter"
# Run the main MD simulation. The output will be stored in the current directory
bash "$val" "$enval" "$flval"
# Make directory to store the bond-order files
mkdir Data/
mv *.txt Data/
rm Data/*.txt
bash Bond_Param_Gen.sh
mv *.txt Data/
mv *.cfg Data/
# Home directory or scratch directory
directory="/global/home/users/shoubhaniknath"
new_filename="Data ${flval} impacts energy ${enval} number ${counter}"
# Rename and move the data folder
mv "Data" "$directory/$new_filename"
}
# Export the function so that GNU Parallel can access it
export -f run_simulation
# Set number of jobs based on number of cores available and number of threads per core
export JOBS_PER_NODE=$(( $SLURM_CPUS_ON_NODE / $SLURM_CPUS_PER_TASK ))
# Run simulations in parallel
for enval in "${energy_list[@]}"; do
for flval in "${fluence_list[@]}"; do
# Use GNU Parallel to parallelize the loop over 'counter'
# Use below line for multiple nodes
# parallel --dry-run --jobs $JOBS_PER_NODE --slf hostfile run_simulation "$enval" "$flval" {} "$val" ::: {1..3}
# For single node, use below line
echo $JOBS_PER_NODE
parallel --jobs $JOBS_PER_NODE --joblog task.log --resume --bar run_simulation "$enval" "$flval" {} "$val" ::: {1..3}
done
done
My issue is that I am not able to print the progress bar of the parallel, and have no idea why. Simple parallel commands executed in the current working directory do show the progress bar. What am I doing wrong here?
Upvotes: 0
Views: 83
Reputation: 1
Figured it out later on. The progress bar will only be displayed in the compute node, so to see the progress bar, one should use srun
Upvotes: 0
Reputation: 33740
Try something like this:
parallel --jobs $JOBS_PER_NODE --joblog task.log --resume --bar run_simulation {2} {3} {1} "$val" ::: {1..3} ::: "${energy_list[@]}" ::: "${fluence_list[@]}"
Upvotes: 0