Reputation: 734
I'm starting the SLURM job with script and script must work depending on it's location which is obtained inside of script itself with SCRIPT_LOCATION=$(realpath $0)
. But SLURM copies script to slurmd
folder and starts job from there and it screws up further actions.
Are there any option to get location of script used for slurm job before it has been moved/copied?
Script is located in network shared folder /storage/software_folder/software_name/scripts/this_script.sh
and it must to:
software_name
foldersoftware_name
folder to a local folder /node_folder
on node/node_folder/software_name/scripts/launch.sh
My script is
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --partition=my_partition_name
# getting location of software_name
SHARED_PATH=$(dirname $(dirname $(realpath $0)))
# separating the software_name from path
SOFTWARE_NAME=$(basename $SHARED_PATH)
# target location to copy project
LOCAL_SOFTWARE_FOLDER='/node_folder'
# corrected path for target
LOCAL_PATH=$LOCAL_SOFTWARE_FOLDER/$SOFTWARE_NAME
# Copying software folder from network storage to local
cp -r $SHARED_PATH $LOCAL_SOFTWARE_FOLDER
# running the script
sh $LOCAL_PATH/scripts/launch.sh
It runs perfectly, when I run it on the node itself (without using SLURM) via: sh /storage/software/scripts/this_script.sh
.
In case of running it with SLURM as
sbatch /storage/software/scripts/this_script.sh
it is assigned to one of nodes, but:
/var/spool/slurmd/job_number/slurm_script
and it screws everything up since $(dirname $(dirname $(realpath $0)))
returns /var/spool/slurmd
Is it possible to get original location (/storage/software_folder/software_name/
) inside of script when it is started with SLURM?
P.S. All machines are running Fedora 30 (x64)
UPDATE 1
There was a suggestion to run as sbatch -D /storage/software_folder/software_name ./scripts/this_script.sh
and use the SHARED_PATH="${SLURM_SUBMIT_DIR}"
inside of script itself.
But it raise the error sbatch: error: Unable to open file ./scripts/this_script.sh
.
Also, I tried to use absolute paths:
sbatch -D /storage/software_folder/software_name /storage/software_folder/software_name/scripts/this_script.sh
. It tries to run, but:
echo "${SLURM_SUBMIT_DIR}"
inside of script prints /home/username_who_started_script
instead of /storage/software_folder/software_name
Any other suggestions?
UPDATE 2:
Also tried to use #SBATCH --chdir=/storage/software_folder/software_name
inside of script, but in such case echo "${SLURM_SUBMIT_DIR}"
returns /home/username_who_started_script
or /
(if run as root)
UPDATE 3
Approach with ${SLURM_SUBMIT_DIR}
worked only if task is ran as:
cd /storage/software_folder/software_name
sbatch ./scripts/this_script.sh
But it doesn't seem to be a proper solution. Are there any other ways?
SOLUTION
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --partition=my_partition_name
# check if script is started via SLURM or bash
# if with SLURM: there variable '$SLURM_JOB_ID' will exist
# `if [ -n $SLURM_JOB_ID ]` checks if $SLURM_JOB_ID is not an empty string
if [ -n $SLURM_JOB_ID ]; then
# check the original location through scontrol and $SLURM_JOB_ID
SCRIPT_PATH=$(scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}')
else
# otherwise: started with bash. Get the real location.
SCRIPT_PATH=$(realpath $0)
fi
# getting location of software_name
SHARED_PATH=$(dirname $(dirname $(SCRIPT_PATH)))
# separating the software_name from path
SOFTWARE_NAME=$(basename $SHARED_PATH)
# target location to copy project
LOCAL_SOFTWARE_FOLDER='/node_folder'
# corrected path for target
LOCAL_PATH=$LOCAL_SOFTWARE_FOLDER/$SOFTWARE_NAME
# Copying software folder from network storage to local
cp -r $SHARED_PATH $LOCAL_SOFTWARE_FOLDER
# running the script
sh $LOCAL_PATH/scripts/launch.sh
Upvotes: 21
Views: 8484
Reputation: 3
In case of using SLURM's
#SBATCH --array 1-N
, @damienfrancois works if the first path of the return argument is taken. The reply from @Karthik Govindappa did not work for me, but this approach https://unix.stackexchange.com/a/53315 resulting in
IFS=' ' read -r THEPATH _ <<< $(scontrol show job "$SLURM_JOB_ID" | awk -F= '/Command=/{print $2}')
Upvotes: 0
Reputation: 59100
You can get the initial (i.e. at submit time) location of the submission script from scontrol
like this:
scontrol show job "$SLURM_JOB_ID" | awk -F= '/Command=/{print $2}'
So you can replace the realpath $0
part with the above. This will only work within a Slurm allocation of course. So if you want the script to work in any situation, you will need some logic like:
if [ -n "${SLURM_JOB_ID:-}" ] ; then
THEPATH=$(scontrol show job "$SLURM_JOB_ID" | awk -F= '/Command=/{print $2}')
else
THEPATH=$(realpath "$0")
fi
and then proceed with
SHARED_PATH=$(dirname "$(dirname "${THEPATH}")")
Upvotes: 16
Reputation: 21
I had to do the same in an array job, the accepted answer from @damienfrancois works well for all jobs except the jobid which is same as ArrayJobId. Just piping awk command to head command would do the trick
scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}' | head -n 1
Upvotes: 2
Reputation: 179
In script, get SHARED_PATH
as SHARED_PATH="${SLURM_SUBMIT_DIR}"
Submit script as sbatch -D /storage/software ./scripts/this_script.sh
See here.
From referred page:
-D
Set the working directory of the batch script to directory before it is executed. The path can be specified as full path or relative path to the directory where the command is executed.
SLURM_SUBMIT_DIR
The directory from which sbatch was invoked or, if applicable, the directory specified by the -D, --chdir option.
P.S. Above is from Version 19.05 doc.
While looking in to archive, referring Ver. 18.x (esp. 18.08), it doesn't mention the same. See this
SLURM_SUBMIT_DIR
.
The directory from which sbatch was invoked.
Upvotes: 0