jimypbr
jimypbr

Reputation: 148

Keras/Tensorflow multi GPU InvalidArgumentError in optimizer

I want to try multi GPU training in Keras with Tensorflow backend.

I am trying the function make_parallel described here: https://medium.com/@kuza55/transparent-multi-gpu-training-on-tensorflow-with-keras-8b0016fd9012. The code for that is here (updated for Keras 2):

from keras.layers import concatenate
from keras.layers.core import Lambda
from keras.models import Model

import tensorflow as tf

def make_parallel(model, gpu_count):
    def get_slice(data, idx, parts):
        shape = tf.shape(data)
        size = tf.concat([ shape[:1] // parts, shape[1:] ],axis=0)
        stride = tf.concat([ shape[:1] // parts, shape[1:]*0 ],axis=0)
        start = stride * idx
        return tf.slice(data, start, size)

    outputs_all = []
    for i in range(len(model.outputs)):
        outputs_all.append([])

    #Place a copy of the model on each GPU, each getting a slice of the batch
    for i in range(gpu_count):
        with tf.device('/gpu:%d' % i):
            with tf.name_scope('tower_%d' % i) as scope:

                inputs = []
                #Slice each input into a piece for processing on this GPU
                for x in model.inputs:
                    input_shape = tuple(x.get_shape().as_list())[1:]
                    slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)
                    inputs.append(slice_n)                

                outputs = model(inputs)

                if not isinstance(outputs, list):
                    outputs = [outputs]

                #Save all the outputs for merging back together later
                for l in range(len(outputs)):
                    outputs_all[l].append(outputs[l])

    # merge outputs on CPU
    with tf.device('/cpu:0'):
        merged = []
        for outputs in outputs_all:
            merged.append(concatenate(outputs, axis=0))

        return Model(inputs=model.inputs, outputs=merged)

I create a model:

model = make_parallel(create_model(...), 4)
model.compile(optimizer='adam', loss='mse', metrics=['mae', 'mse',])

After running fit it trains for a single epoch then crashed with the following exception:

InvalidArgumentError (see above for traceback): Incompatible shapes: [120,1] vs. [122,1]
     [[Node: training_6/Adam/gradients/loss_10/concatenate_7_loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss_10/concatenate_7_loss/sub"], _device="/job:localhost/replica:0/task:0/gpu:0"](training_6/Adam/gradients/loss_10/concatenate_7_loss/sub_grad/Shape/_10935, training_6/Adam/gradients/loss_10/concatenate_7_loss/sub_grad/Shape_1)]]
     [[Node: training_6/Adam/gradients/concatenate_7/concat_grad/Slice_1/_11003 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:1", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_4728_training_6/Adam/gradients/concatenate_7/concat_grad/Slice_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:1"]()]]

Something goes wrong when it gets to the stage of combining the gradients of the models on the different GPUs. The incompatible shape sizes in the exception are related to the batch size (128 here) in some way (i.e. changing the batch size changes the incompatible shape sizes).

Upvotes: 1

Views: 680

Answers (2)

Asif Patankar
Asif Patankar

Reputation: 97

As of December 2020, rearranging "MaxPooling2D" layer/s solved the problem.

Upvotes: 0

kww
kww

Reputation: 549

Your issue seems to be similar to the one reported here. It appears that the input data size must be a multiple of the number of GPUs.

From the link:

The number of samples just needs to be a mutiple of the total number of GPUs.

Ex. I had 68531 samples in in my input, and once I shaved that down to 68528 with 8 GPUs, it worked fine.

Upvotes: 1

Related Questions