Reputation: 5067
I'm on CentOS 6.9 running slurm 17.11.7. I've modified my /gpfs0/export/slurm/conf/epilog
script. I'm ultimately would like to print out job resource utilization information to the stdout file used be each users' job.
I've been testing it within the conditional at the end of the script for myself before I roll it out to other users. Below is my modified epilog
script:
#!/bin/bash
# Clear out TMPDIR on the shared file system after job completes
exec >> /var/log/epilog.log
exec 2>> /var/log/epilog.log
if [ -z $SLURM_JOB_ID ]
then
echo -e " This script should be executed from slurm."
exit 1
fi
TMPDIR="/gpfs0/scratch/${SLURM_JOB_ID}"
rm -rf $TMPDIR
### My additions to the existing script ###
if [ "$USER" == "myuserid" ]
then
STDOUT=`scontrol show jobid ${SLURM_JOB_ID} | grep StdOut | awk 'BEGIN{FS="="}{print $2}'`
# Regular stdout/stderr is not respected, must use python.
python -c "import sys; stdout=sys.argv[1]; f=open(stdout, 'a'); f.write('sticks\n'); f.close();" ${STDOUT}
fi
exit 0
From the Prolog and Epilog section of the slurm.conf user manual it seems that stdout/stderr are not respected. Hence I modify the stdout file with python.
I've picked the compute node node21 to run this job, so I logged into node21 and tried several things to get it to notice my changes to the epilog script.
Reconfiguring slurmd
:
sudo scontrol reconfigure
Restart slurm daemon:
sudo service slurm stop
sudo service slurm start
Neither of which seems to get the changes to the epilog script when I submit jobs. When put the same conditional in a batch script it runs flawlessly:
#!/bin/bash
#SBATCH --nodelist=node21
echo "Hello you!"
echo $HOSTNAME
if [ "$USER" == "myuserid" ]
then
STDOUT=`scontrol show jobid ${SLURM_JOB_ID} | grep StdOut | awk 'BEGIN{FS="="}{print $2}'`
python -c "import sys; stdout=sys.argv[1]; f=open(stdout, 'a'); f.write('sticks\n'); f.close();" ${STDOUT}
#echo "HELLO! ${USER}"
fi
QUESTION : Where am I going wrong?
EDIT : This is a MWE from within the context of trying to print resource utilization of jobs at the end of the output.
Upvotes: 0
Views: 1812
Reputation: 21
According to this page, you can print to stdout from the Slurm prolog by prefacing your output with the 'print' command.
For example, instead of
echo "Starting prolog"
You need to do
echo "print Starting Prolog"
Unfortunately this only seems to work for the prolog, not the epilog.
Upvotes: 0
Reputation: 5067
To get this, append the end of the epilog.log
script with
# writing job statistics into job output
OUT=`scontrol show jobid ${SLURM_JOB_ID} | grep StdOut | awk 'BEGIN{FS="="}{print $2}'`
echo -e "sticks" >> ${OUT} 2>&1
There was no need to restart the slurm daemons. Additional commands can be added to it to get resource utilization, e.g.
sleep 5s ### Sleep to give chance for job to be written to slurm database for job statistics.
sacct --units M --format=jobid,user%5,state%7,CPUTime,ExitCode%4,MaxRSS,NodeList,Partition,ReqTRES%25,Submit,Start,End,Elapsed -j $SLURM_JOBID >> $OUT 2>&1
Basically, you can still append the output file using >>
. Evidently, it did not occur to me that regular output redirection still works. It is still unclear why the python statement to this did not work.
Upvotes: 1