Reputation: 11
We have a single GPU installed(Nvidia GeForce GTX Titan X) in a server and we want to run more than one ML model concurrently. This server is a node in a Kubernetes cluster and kubeflow is also installed.
If we ssh on the server we can run multiple jobs on the GPU. When creating notebook servers via Kubeflow UI there is an option to select 1,2,3... GPUs. In our case, we select one. However,2 notebook servers can't be up and running concurrently.
Looking further into Kubernetes we see that if one notebook server with GPU is running and we try to start a second one the pod created by the second server is tainted.
Is there a way to configure Kubeflow to allow this kind of behaviour?
P.S. We are using Kubeflow 1.4
Upvotes: 1
Views: 494
Reputation: 3
I have worked with kubernetes (k3s) and GPUs and it was a challenging task to manage pods on a single GPU. After some research I found nvshare.
They state:
"nvshare is a GPU sharing mechanism that allows multiple processes (or containers running on Kubernetes) to securely run on the same physical GPU concurrently, each having the whole GPU memory available."
It worked for me.
Upvotes: 0
Reputation: 9701
Since Kubeflow is using Kubernetes, perhaps the GPU time-slicing can give you more information on this. The document has last been updated this year - 2024, so it's pretty current.
What this technology (plugin for Kubernetes) does is basically interleaving requests for a GPU's resources. It's not literally concurrently but then again neither is what happens on most OSs where you have just a few cores yet hundreds if not thousands of processes that run "simultaneously". So just like in the more common case, here your pods also may have to wait for X amount of time to get to the GPU but at least they are queued.
An older blog post shows a possible way to split the GPUs memory in order to provide it to multiple pods.
Alternatively you can check out multi-instancing although that one is reserved for the latest gens of AI accelerators (e.g. A100, H100 and A30) but its worth at least a read.
Upvotes: 0