GRS
GRS

Reputation: 3094

What is the Nvidia driver version of GKE Autopilot nodes?

How do I find the driver version of the node in Autopilot?

I need the 525 driver version on the node - but I suspect it's 470.

Is there a way to specify a nodeSelector to provision nodes with 525 version of the driver?

Upvotes: 2

Views: 1519

Answers (2)

William Denniss
William Denniss

Reputation: 16346

As noted in the other comments, you can query the logs of the DaemonSet which installs the drivers.

As there are a few instances of this DaemonSet for different sized nodes, here's one way to find a running Pod instance to query:

$ kubectl get pods -n kube-system | grep nvidia
nvidia-gpu-device-plugin-small-cos-fjjj4                   1/1     Running     0          141m

$ kubectl logs nvidia-gpu-device-plugin-small-cos-fjjj4 -n kube-system | grep Driver
Defaulted container "nvidia-gpu-device-plugin" out of: nvidia-gpu-device-plugin, nvidia-driver-installer (init), partition-gpus (init)
I1204 01:49:36.618905    5680 metrics.go:144] nvml initialized successfully. Driver version: 535.104.12

Per this output, we can see that the driver version is 535.104.12.

This version is set by Autopilot and cannot be changed at present.

Upvotes: 0

Kranthiveer Dontineni
Kranthiveer Dontineni

Reputation: 1543

In Autopilot clusters, GKE manages the driver version selection and installation, however if you need the list of GPU driver versions associated with GKE version, refer to the corresponding Container-Optimized OS page linked in the GKE current versions table.

For example if you have selected GKE version 1.25.7-gke.1000 the COS version available is cos-101-17162-127-27 and the gpu driver version supported will be v470.182.03(default), v525.105.17

You can follow this documentation for deploying your gpu workloads on autopilot cluster.

Edit 1: The below steps within the lines are meant for standard clusters.


After adding GPU nodes to your cluster, you need to install NVIDIA's device drivers on the nodes. Google provides a DaemonSet that you can apply to install the drivers. On GPU nodes that use Container-Optimized OS images, you also have the option of selecting between the default GPU driver version or a newer version


Note: This content is taken from google cloud official documents which are embedded into the content.

Upvotes: 3

Related Questions