Reputation: 1250
There is a shared server with 2 GPUs in my institution. Suppose there are two team members each wants to train a model at the same time, then how do they get Keras to train their model on a specific GPU so as to avoid resource conflict?
Ideally, Keras should figure out which GPU is currently busy training a model and then use the other GPU to train the other model. However, this doesn't seem to be the case. It seems that by default Keras only uses the first GPU (since the Volatile GPU-Util
of the second GPU is always 0%).
Upvotes: 8
Views: 12111
Reputation: 1
If you want to train models on cloud GPUs (e.g. GPU instances from AWS), try this library:
!pip install aibro==0.0.45 --extra-index-url https://test.pypi.org/simple
from aibro.train import fit
machine_id = 'g4dn.4xlarge' #instance name on AWS
job_id, trained_model, history = fit(
model=model,
train_X=train_X,
train_Y=train_Y,
validation_data=(validation_X, validation_Y),
machine_id=machine_id
)
Tutorial: https://colab.research.google.com/drive/19sXZ4kbic681zqEsrl_CZfB5cegUwuIB#scrollTo=ERqoHEaamR1Y
Upvotes: 0
Reputation: 563
If you are using a training script you can simply set it in the command line before invoking the script
CUDA_VISIBLE_DEVICES=1 python train.py
Upvotes: 7
Reputation: 1249
Possibly duplicate with my previous question
It's a bit more complicated. Keras will the memory in both GPUs althugh it will only use one GPU by default. Check keras.utils.multi_gpu_model
for using several GPUs.
I found the solution by choosing the GPU using the environment variable CUDA_VISIBLE_DEVICES.
You can add this manually before importing keras or tensorflow to choose your gpu
os.environ["CUDA_VISIBLE_DEVICES"]="0" # first gpu
os.environ["CUDA_VISIBLE_DEVICES"]="1" # second gpu
os.environ["CUDA_VISIBLE_DEVICES"] = "-1" # runs in cpu
To make it automatically, I made a function that parses nvidia-smi
and detects automatically which GPU is being already used and sets the appropriate value to the variable.
Upvotes: 23