Reputation: 3094
How do I find the driver version of the node in Autopilot?
I need the 525 driver version on the node - but I suspect it's 470.
Is there a way to specify a nodeSelector
to provision nodes with 525 version of the driver?
Upvotes: 2
Views: 1519
Reputation: 16346
As noted in the other comments, you can query the logs of the DaemonSet which installs the drivers.
As there are a few instances of this DaemonSet for different sized nodes, here's one way to find a running Pod instance to query:
$ kubectl get pods -n kube-system | grep nvidia
nvidia-gpu-device-plugin-small-cos-fjjj4 1/1 Running 0 141m
$ kubectl logs nvidia-gpu-device-plugin-small-cos-fjjj4 -n kube-system | grep Driver
Defaulted container "nvidia-gpu-device-plugin" out of: nvidia-gpu-device-plugin, nvidia-driver-installer (init), partition-gpus (init)
I1204 01:49:36.618905 5680 metrics.go:144] nvml initialized successfully. Driver version: 535.104.12
Per this output, we can see that the driver version is 535.104.12.
This version is set by Autopilot and cannot be changed at present.
Upvotes: 0
Reputation: 1543
In Autopilot clusters, GKE manages the driver version selection and installation, however if you need the list of GPU driver versions associated with GKE version, refer to the corresponding Container-Optimized OS page linked in the GKE current versions table.
For example if you have selected GKE version 1.25.7-gke.1000 the COS version available is cos-101-17162-127-27 and the gpu driver version supported will be v470.182.03(default), v525.105.17
You can follow this documentation for deploying your gpu workloads on autopilot cluster.
Edit 1: The below steps within the lines are meant for standard clusters.
After adding GPU nodes to your cluster, you need to install NVIDIA's device drivers on the nodes. Google provides a DaemonSet that you can apply to install the drivers. On GPU nodes that use Container-Optimized OS images, you also have the option of selecting between the default GPU driver version or a newer version
Note: This content is taken from google cloud official documents which are embedded into the content.
Upvotes: 3