Reputation: 2175
I want to run multiple jobs in a kubernetes cluster, but the total resource requirements exceed the size of the cluster, and the requirements of one job span multiple nodes. How do I avoid a livelock where all jobs have some resources, but none have enough to complete?
For example, suppose I have 4 nodes, each with 1 GB of memory available. I want to submit 2 jobs, each of which requires 3 GB of memory to complete, split across 3 pods that require 1 GB each. The correct solution here would be to run the jobs sequentially, how do I ensure this happens?
I want to avoid the situation where both jobs schedule two pods each, using up the entire cluster, while the remaining pod of each job is stuck in the Pending
state, as no more resources are available. Because the jobs cannot complete using only 2 GB of memory, the system is now incapable of making progress.
Some features I've looked at that don't seem to be suitable:
It looks like a custom scheduler is needed. Kube Batch looks like a possible solution for this, supporting a minMember
attribute. I will test this and submit it as a self-answer, unless anyone can chime in with more detail.
Upvotes: 0
Views: 462
Reputation: 2196
The easy solution is to assign each a job a PriorityClass so that one job can preempt the other if needed:
https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/
However, this means one job will always have priority over the other. If you need them to run in the order they were received, you need a queue job system. Here is one you can try:
https://github.com/kubernetes-sigs/kueue
Using kueue, you would create a Workload
for each job as they come in and add it to the same LocalQueue
.
Upvotes: 1