Reputation: 646
I use Google Cloud Platform for my project.
Now, I have a cluster with 4 node pools:
- "micro-pool": with minimal machines for managing the cluster
- "cpu-pool": with cpu-only machines for processes that don't need a GPU
- 2 "gpu-pools": two pools with machines that have GPUs attached.
Now, what I need is for my CPU processes to never work on a GPU machine because they take so much time and doing that on a GPU machine is just costing money for nothing.
I run my pods using the
kubectl run dc-1 --image={image-name} --replicas=1 --restart=Never --limits="nvidia.com/gpu=0,cpu=4000m,memory=2Gi" -- bash -c "command to execute"
Now, this works fine if there were no "GPU-machines" created from previous GPU runs. But if there was a very recent GPU run, this command will run on that instance because it has the minimum cpu and memory requirements. I thought the --limits="nvidia.com/gpu=0
would do the trick but obviously it didn't.
What should I do?
Upvotes: 0
Views: 992
Reputation: 541
This is a good use case for taints and tolerations. You can taint the GPU nodes with NoSchedule. This will prevent pods (even system pods) that don't have a toleration for that taint from running on the GPU nodes
kubectl taint nodes gpuNode1 nodetype=gpu:NoSchedule
Then, on pods you do want to run on these nodes, you can add a toleration for the taint:
tolerations:
- key: "nodetype"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
I'm not sure on GCP, but on Azure's AKS you can configure the taint when you create the cluster and the node pools.
Edit:
You will want to combine this with Harsh Manvar's sugestion of node selectors and/or affinity. Just because your pod can tolerate the taint, doesn't mean it will be scheduled on the GPU nodes for sure, it will just make sure other things are not.
Upvotes: 1
Reputation: 30083
if you want to assign the pod on particular instance or node you can use the kubernetes node selector
for example :
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
here it will assign pod based on the node selector which is disk type.
you can also check this url for further documentation : https://kubernetes.io/docs/concepts/configuration/assign-pod-node
Edit 1 :
as you are on GCP you can use this way also :
nodeSelector:
#<labelname>:value
cloud.google.com/gke-nodepool: pool-highcpu8 (poolname)
Edit 2 :
if you have knowledge of affinity
and anity-affinity
you can implement it also.
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/node-type
operator: In
values:
- gpu
For cpu :
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: resources
operator: In
values:
- cpu-only
Upvotes: 1