Reputation: 55
I am using srun to submit a bash script in parallel with a different input variable for each execution. Basically my submit script looks as follows:
#!/bin/sh
#SBATCH --time=48:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=24
#SBATCH --job-name=name
#SBATCH --output=name
#SBATCH -p name
for system in `cat ${system_file}`; do
srun --exclusive -N1 -n1 bash script.sh ${system} &
done
wait
Normally it writes the terminal output to the output in the file specified under #SBATCH --output= The problem is the output overlaps with different tasks due to multiple tasks running at once. There is an error output from a program called in this script and I need to track down which input variable is actually associated with this error.
What I need is to write a separate output file for each iteration of this for loop so that there is no overlap.
I tried including %s and %t in the output file names because I thought each iteration of the for loop might have a different step or task id, but this still only feeds to one output file.
Upvotes: 2
Views: 2610
Reputation: 121
The #SBATCH --output=name
statement is used by slurm to write messages for the job as a whole, including from each srun if no specific output is provided for them.
To get unique output from each srun, you must include the option --output
with srun, not sbatch, e.g.:
#SBATCH --ntasks=24
for system in `cat ${system_file}`; do
srun --exclusive --output ${system}-%j-%t-%s.out bash script.sh &
done
Each task will be on a different CPU, within a task each step (an execution of srun in the loop) will be on the same CPU (I think). By specifying %t and %s, you force the creation of separate output files. The total number will be #tasks × #steps, here 24 × length(`cat ${system_file}`). $system and %s will actually always correlate in this case, so only one or the other is necessary.
By the way, I don’t think you need to include the options -N1 and -n1 with each srun, that will be assumed.
Upvotes: 2
Reputation: 59260
To ease the process of identifying which instance dropped the error, you can add the -l
option to srun
to prepend the task ID to the line. From the srun manpage:
-l, --label Prepend task number to lines of stdout/err. The --label option will prepend lines of output with the remote task id. This option applies to step allocations.
If you want to create one file per task, though, you will need to do it by redirecting the output explicitly in the submission script. For instance:
#!/bin/sh
#SBATCH --time=48:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=24
#SBATCH --job-name=name
#SBATCH --output=name
#SBATCH -p name
for system in `cat ${system_file}`; do
srun --exclusive -N1 -n1 bash script.sh ${system} &> name.${system}.out &
done
wait
Upvotes: 1