Reputation: 11
I have a kubernetes cluster with 2 workers . One has 2 GPUs and other is only CPU system. I have a component written in pipeline using katib python tune(). In parameters i am using resources_per_trial = {"gpu":1}
, but i am getting error.
0/3 nodes are available: 3 Insufficient nvidia.com/gpu. preemption:
0/3 nodes are available: 3 No preemption victims found for incoming pod.
On checking the 2 nodes I can see that the pod created for this component is in CPU system and not in GPU system.
Component is declared as:
@component(base_image=<gpu image name>, packages_to_install=['kubeflow-katib','git+https://github.com/kubeflow/katib.git@master#subdirectory=sdk/python/v1beta1'])
def model_tuning():
Upvotes: 0
Views: 11