AVarf
AVarf

Reputation: 5169

How to train a model on multi gpus with tensorflow2 and keras?

I have an LSTM model that I want to train on multiple gpus. I transformed the code to do this and in nvidia-smi I could see that it is using all the memory of all the gpus and each of the gpus are utilizing around 40% BUT the estimated time for training of each batch was almost the same as 1 gpu.

Can someone please guid me and tell me how I can train properly on multiple gpus?

My code:

import tensorflow as tf

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout

import os
from tensorflow.keras.callbacks import ModelCheckpoint



checkpoint_path = "./model/"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = ModelCheckpoint(filepath=checkpoint_path, save_freq= 'epoch', verbose=1 )

# NNET - LSTM
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    regressor = Sequential()

    regressor.add(LSTM(units = 180, return_sequences = True, input_shape = (X_train.shape[1], 3)))
    regressor.add(Dropout(0.2))

    regressor.add(LSTM(units = 180, return_sequences = True))
    regressor.add(Dropout(0.2))

    regressor.add(LSTM(units = 180))
    regressor.add(Dropout(0.2))

    regressor.add(Dense(units = 4))

    regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

regressor.fit(X_train, y_train, epochs = 10, batch_size = 32, callbacks=[cp_callback])

Upvotes: 1

Views: 1364

Answers (2)

Srihari Humbarwadi
Srihari Humbarwadi

Reputation: 2632

Assuming that your batch_size for a single GPU is N and the time taken per batch is X secs.

You can measure the training speed by measuring the time taken for the model to converge, but you have to make sure that you feed in the right batch_size with 2 GPUs since 2 GPUs will have twice the memory of a single GPU you should linearly scale your batch_size to 2N. It might be deceiving to see that the model still takes X secs per batch, but you should know that now your model is seeing 2N samples per batch, and would lead to a quicker convergence because now you can train with a higher learning rate.

If both of your GPUs have their memory utilized and are sitting at 40% utilization there might be multiple reasons

  • The model is too simple and you don't need all that compute.
  • Your batch_size is small and your GPUs can handle a bigger batch_size
  • Your CPU is the bottleneck and thus making the GPUs wait for the data to be ready, this can be the case when you see spikes in GPU utilization
  • You need to write a better and performant data pipeline. You can find more about efficient data input pipelines here - https://www.tensorflow.org/guide/data_performance

Upvotes: 2

Suraj Subbarao
Suraj Subbarao

Reputation: 89

You can try using CuDNNLSTM. Its way faster than the usual LSTM layer.

https://www.tensorflow.org/api_docs/python/tf/compat/v1/keras/layers/CuDNNLSTM

Upvotes: 0

Related Questions