Alex Rothberg
Alex Rothberg

Reputation: 10993

Ensuring one Job Per Node on StarCluster / SunGridEngine (SGE)

When qsubing jobs on a StarCluster / SGE cluster, is there an easy way to ensure that each node receives at most one job at a time? I am having issues where multiple jobs end up on the same node leading to out of memory (OOM) issues.

I tried using -l cpu=8 but I think that does not check the number of USED cores just the number of cores on the box itself.

I also tried -l slots=8 but then I get:

Unable to run job: "job" denied: use parallel environments instead of requesting slots explicitly.

Upvotes: 4

Views: 1859

Answers (3)

Tobias
Tobias

Reputation: 56

In your config file (.starcluster/config) add this section:

[plugin sge]
setup_class = starcluster.plugins.sge.SGEPlugin
slots_per_host = 1

Upvotes: 4

Alex Rothberg
Alex Rothberg

Reputation: 10993

I accomplished this by setting the number of slots on each my nodes to 1 using: qconf -aattr queue slots "[nodeXXX=1]" all.q

Upvotes: -1

Vince
Vince

Reputation: 3395

Largely depends on how the cluster resources are configured i.e. memory limits, etc. However, one thing to try is to request a lot of memory for each job:

-l h_vmem=xxG

This will have side-effect of excluding other jobs from running on a node by virtue that most of the memory on that node is already requested by another previously running job.

Just make sure the memory you request is not above the allowable limit for the node. You can see if it bypassing this limit by checking the output of qstat -j <jobid> for errors.

Upvotes: 1

Related Questions