Reputation: 24131
I am using TensorFlow 1.9, on an NVIDIA GPU with 3 GB of memory. The size of my minibatch is 100 MB. Therefore, I could potentially fit multiple minibatches on my GPU at the same time. So my question is about whether this is possible and whether it is standard practice.
For example, when I train my TensorFlow model, I run something like this on every epoch:
loss_sum = 0
for batch_num in range(num_batches):
batch_inputs = get_batch_inputs()
batch_labels = get_batch_labels()
batch_loss, _ = sess.run([loss_op, train_op], feed_dict={inputs: batch_inputs, labels: batch_labels})
loss_sum += batch_loss
loss = batch_loss / num_batches
This iterates over my minibatches and performs one weight update per minibatch. But the size of image_data
and label_data
is only 100 MB, so the majority of the GPU is not being used.
One option would be to just increase the minibatch size so that the minibatch is closer to the 3 GB GPU capacity. However, I want to keep the same small minibatch size to help with optimisation.
So the other option might be to send multiple minibatches through the GPU in parallel, and perform one weight update per minibatch. Being able to send the minibatches in parallel would significantly reduce the training time.
Is this possible and recommended?
Upvotes: 4
Views: 1688
Reputation: 11333
Thought I might point out that, arbitrarily making the batch size large (when you have large amounts of memory) can be bad sometimes in terms of the generalization of your model.
Reference:
Train longer, generalize better
On Large-Batch Training for Deep Learning.
Upvotes: 0
Reputation: 11817
The goal of the Mini Batch approach is to update the weights of your network after each batch is processed and use the updated weights in the next mini-batch. If you do some clever tricks and batch several mini-batches they would effectively use the same old weights.
The only potential benefit I can see is if the model works better with bigger mini-batches, e.g. big_batches * more_epochs
is better than mini_batches * less_epochs
. I don't remember the theory behind Mini Batch Gradient Descent but I remember there is a reason you should use mini batches instead of the whole training set for each iteration. On the other hand, the mini batch size is a hyperparameter
that has to be tuned anyway, so it's probably worth fiddling it a bit.
Upvotes: 5