user187785
user187785

Reputation: 21

How to time a SLURM job array?

I am submitting a SLURM job array and want to have the total runtime (i.e. not the runtime of each task) printed to the log.

This is what I tried:

#!/bin/bash

#SBATCH --job-name=step1
#SBATCH --output=logs/step1.log
#SBATCH --error=logs/step1.log
#SBATCH --array=0-263%75

start=$SECONDS

python worker.py ${SLURM_ARRAY_TASK_ID} 

echo "Completed step1 in $SECONDS seconds"

What I get in step1.log is something like this:

Completed step1 in 42 seconds
Completed step1 in 94 seconds
Completed step1 in 88 seconds
...

which appear to be giving the runtimes for the last group of tasks in the array. I want a single timer for the whole array, from submission to the end of the last task. Is that possible?

Upvotes: 2

Views: 1669

Answers (1)

ciaron
ciaron

Reputation: 1169

With job arrays, each task is an identical submission of your script, so the way you're measuring time will necessarily only be per-task, as you're seeing. To get the overall elapsed time of the entire jobarray, you'll need to get the submit time of the first task and subtract it from the end time of the last task.

e.g.

# get submit time for first task in array
sacct -j <job_id>_0 --format=submit

# get end time for last task in array
sacct -j <job_id>_263 --format=end

Then use date -d <timestamp from sacct> +%s to convert the timestamps to seconds since the epoch, to make them easier to subtract.

Also note that each of your 264 tasks will overwrite step1.log with its own output. I would typically use #SBATCH --output=step1-%A_%a.out to distinguish outputs from different tasks.

Upvotes: 4

Related Questions