Reputation: 894
If I want to train Keras models and have multiple GPUs available, there are several ways of using them effectively:
Assign a GPU each to a different model, and train them in parallel (For example, for hyperparameter tuning, or comparison between different architectures). For example, I have model1 that I assign to GPU1, and model2 to GPU2, and after one global data loading operation, Keras would run model.fit() for each model in parallel on each GPU.
Dividing one model and train in parallel on all GPUs. This is done by splitting the model into sequential chunks, and then computing all gradients for the whole model. The way it is implemented it would not work for different independent models.
Diving data and feeding different batches to the same model on different GPUs.
There seem to be a lot of documentation of 2) and 3)
https://keras.io/guides/distributed_training/
https://www.run.ai/guides/multi-gpu/keras-multi-gpu-a-practical-guide/
But I can't find any solution for 1), and the posts asking for it don't have a solution:
Train multiple keras/tensorflow models on different GPUs simultaneously
It seems that, with those options already available, it should be trivial to also have the option to assign a different GPU to each model, and train in parallel. Is there something I am missing?
EDIT: One proposed solution is just to run different python scripts. But this is not optimal, as it is dividing each GPU per script, not per model, which means all other parts of the script will need to be run twice, redundantly. If the data loading part is expensive, this will be very inefficient, as both scripts will be competing for data access.
Upvotes: 5
Views: 2796
Reputation: 2011
One of the solutions, although I am aware this is not exactly what is desired, is to use TFRecords. This is the scenario that OP is describing where we would run different python scripts - each corresponds to some variation of the same model. What you should realize about training ANN is that in most cases while GPU is busy, CPU is relatively idle. When it comes to loading datasets there are two scenarios:
Load entire dataset at the beggining (with n
different variation of the same model - that's n
times same dataset in memory which might be deadly with large data and we spend n
times more time for loading data)
Create data train/test generators that are queried for each batch (memory problem is partially solved but then we will probably spend more time waiting for the data - read from different disk location etc...)
The problem with the 2. scenario is that after each batch we have to wait for the next batch to be loaded, prepared (like augmented etc.) and transfered to GPU. TensorFlow provides TFRecords which is a binary format for storing data. Along with the format itself comes API for querying data stored in this format and the idea is that when GPU is busy we can prepare the next batch with CPU asynchronously and hence, tackle the bottleneck. This is very well described here:
https://www.tensorflow.org/guide/data_performance
Of course, there is no one global loading feature in this, but this is a good trade off between low memory usage and fast dataset access. Depending on how much work CPU has to do in comparison to GPU this might be a partial solution to your problem.
Upvotes: 1