Danny Weiss
Danny Weiss

Reputation: 31

Submitted jobs in Slurm not appearing in squeue, not getting scheduled

I am attempting to create my own computer cluster (perhaps a Beowulf, though throwing around that term willy nilly apparently isn't cool) and have installed Slurm as my scheduler. Everything appears fine upon inputting sinfo

danny@danny5:~/Cluster/test$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      5   idle danny[1-5]
danny@danny5:~/Cluster/test$ 

However if I try and submit a job using the following script

danny@danny5:~/Cluster/test$ cat script.sh
#!/bin/bash -l
#SBATCH --job-name=JOBNUMBA0NE
#SBATCH --time=00-00:01:00
#SBATCH --partition=debug
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=100
#SBATCH -o stdout
#SBATCH -e stderr
#SBATCH --mail-type=END
#SBATCH --mail-user=dkweiss@wesleyan.edu

gfortran -O3 -i8 0-hc1.f

./a.out

I receive a lovely Submitted batch job 6, however nothing appears in squeue, and none of the expected output files materialize (the executable a.out file doesn't even appear). I will attach the associated info for scontrol show partition:

danny@danny5:~/Cluster/test$ scontrol show partition
PartitionName=debug
   AllocNodes=ALL AllowGroups=ALL Default=YES
   DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 MaxCPUsPerNode=UNLIMITED
   Nodes=danny[1-5]
   Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
   State=UP TotalCPUs=8 TotalNodes=5 SelectTypeParameters=N/A
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

Any ideas?

Upvotes: 3

Views: 10980

Answers (3)

4th_haim_sister
4th_haim_sister

Reputation: 61

This happened to me when the log folder did not exist (had not been created beforehand). Slurm does not automatically handle directory creation for you

Upvotes: 5

user2604899
user2604899

Reputation: 109

I had the same problem, I suppose there could be more reasons why jobs just disappear without any feedback, but in my case slurm simply missed privileges. Therefore:

  1. Try to run sbatchwith sudo, if it succeed this is probably the same issue.
  2. If you are not able to try it, at least define output and error files path manually and make sure that slurm is able to write there.

Upvotes: 3

damienfrancois
damienfrancois

Reputation: 59330

I have seen that behaviour when the user submitting the job (here danny) does not exist with the same UID on the compute nodes. Make sure id danny reports the same output on all Slurm-related nodes. You should look for confirmation in the compute node's slurm log file.

Upvotes: 2

Related Questions