Reputation: 11168
What is AssocGrpNodeLimit
? The squeue
command shows it listed as the "reason" my job is not running yet. I'm surprised, because some of the nodes are idle. My priority is the highest I've ever seen it (2126). I've Googled and Binged it, and I found it as a return value in slurm_protocol_defs.c:
/* Given a job's reason for waiting, return a descriptive string */
extern char *job_reason_string(enum job_state_reason inx)
{
...
case WAIT_ASSOC_GRP_NODE:
return "AssocGrpNodeLimit";
Based on the words and word fragments comprising "AssocGrpNodeLimit", I'm guessing that someone associated with the same group as me is using too many nodes so my job won't run?
Upvotes: 5
Views: 3976
Reputation: 5377
The AssocGrpNodeLimit is the limit of number of nodes set for the association of the submitted job.
You can check the limit with sacctmgr show assoc
and if not limited by the administrators you can also list the jobs of the particular account with squeue -A <account_name>
Definition of the term association from http://slurm.schedmd.com/sacctmgr.html (emphasis in the original):
Slurm account information is recorded based upon four parameters that form what is referred to as an association. These parameters are user, cluster, partition, and account. user is the login name. cluster is the name of a Slurm managed cluster as specified by the ClusterName parameter in the slurm.conf configuration file. partition is the name of a Slurm partition on that cluster. account is the bank account for a job.
Upvotes: 4