Reputation: 148
I want to try multi GPU training in Keras with Tensorflow backend.
I am trying the function make_parallel
described here: https://medium.com/@kuza55/transparent-multi-gpu-training-on-tensorflow-with-keras-8b0016fd9012. The code for that is here (updated for Keras 2):
from keras.layers import concatenate
from keras.layers.core import Lambda
from keras.models import Model
import tensorflow as tf
def make_parallel(model, gpu_count):
def get_slice(data, idx, parts):
shape = tf.shape(data)
size = tf.concat([ shape[:1] // parts, shape[1:] ],axis=0)
stride = tf.concat([ shape[:1] // parts, shape[1:]*0 ],axis=0)
start = stride * idx
return tf.slice(data, start, size)
outputs_all = []
for i in range(len(model.outputs)):
outputs_all.append([])
#Place a copy of the model on each GPU, each getting a slice of the batch
for i in range(gpu_count):
with tf.device('/gpu:%d' % i):
with tf.name_scope('tower_%d' % i) as scope:
inputs = []
#Slice each input into a piece for processing on this GPU
for x in model.inputs:
input_shape = tuple(x.get_shape().as_list())[1:]
slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)
inputs.append(slice_n)
outputs = model(inputs)
if not isinstance(outputs, list):
outputs = [outputs]
#Save all the outputs for merging back together later
for l in range(len(outputs)):
outputs_all[l].append(outputs[l])
# merge outputs on CPU
with tf.device('/cpu:0'):
merged = []
for outputs in outputs_all:
merged.append(concatenate(outputs, axis=0))
return Model(inputs=model.inputs, outputs=merged)
I create a model:
model = make_parallel(create_model(...), 4)
model.compile(optimizer='adam', loss='mse', metrics=['mae', 'mse',])
After running fit it trains for a single epoch then crashed with the following exception:
InvalidArgumentError (see above for traceback): Incompatible shapes: [120,1] vs. [122,1]
[[Node: training_6/Adam/gradients/loss_10/concatenate_7_loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss_10/concatenate_7_loss/sub"], _device="/job:localhost/replica:0/task:0/gpu:0"](training_6/Adam/gradients/loss_10/concatenate_7_loss/sub_grad/Shape/_10935, training_6/Adam/gradients/loss_10/concatenate_7_loss/sub_grad/Shape_1)]]
[[Node: training_6/Adam/gradients/concatenate_7/concat_grad/Slice_1/_11003 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:1", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_4728_training_6/Adam/gradients/concatenate_7/concat_grad/Slice_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:1"]()]]
Something goes wrong when it gets to the stage of combining the gradients of the models on the different GPUs. The incompatible shape sizes in the exception are related to the batch size (128 here) in some way (i.e. changing the batch size changes the incompatible shape sizes).
Upvotes: 1
Views: 680
Reputation: 97
As of December 2020, rearranging "MaxPooling2D" layer/s solved the problem.
Upvotes: 0
Reputation: 549
Your issue seems to be similar to the one reported here. It appears that the input data size must be a multiple of the number of GPUs.
From the link:
The number of samples just needs to be a mutiple of the total number of GPUs.
Ex. I had 68531 samples in in my input, and once I shaved that down to 68528 with 8 GPUs, it worked fine.
Upvotes: 1