Data parallelism in Keras

Question

I am looking for data parallelism in keras(tensorflow backend), not model parallelism. I am performing video classification on video file, and hence could only fit a batch of size 2 in the GPU. So, I was wondering a way to use multiple GPUs in order increase my batch size for better estimation and faster training. Can you suggest me an effective way to do this?

I am using one 12gb TitanX and one 6gb Titan Black.

Thanks

Jonathan · Accepted Answer

This is one way to do it:

This method to_multi_gpu gets a model (defined using Keras 2.0 over a single GPU), and returns that same model replicated (with shared parameters) over multiple GPUs. The input to the new model is being sliced evenly and each slice is passed to one of the replicated models. The output from all the replicated models is concatenated at the end.

from keras import backend as K
from keras.models import Model
from keras.layers import Input
from keras.layers.core import Lambda
from keras.layers.merge import Concatenate

def slice_batch(x, n_gpus, part):
    """
    Divide the input batch into [n_gpus] slices, and obtain slice number [part].
    i.e. if len(x)=10, then slice_batch(x, 2, 1) will return x[5:].
    """
    sh = K.shape(x)
    L = sh[0] // n_gpus
    if part == n_gpus - 1:
        return x[part*L:]
    return x[part*L:(part+1)*L]


def to_multi_gpu(model, n_gpus=2):
    """
    Given a keras [model], return an equivalent model which parallelizes
    the computation over [n_gpus] GPUs.

    Each GPU gets a slice of the input batch, applies the model on that slice
    and later the outputs of the models are concatenated to a single tensor, 
    hence the user sees a model that behaves the same as the original.
    """
    with tf.device('/cpu:0'):
        x = Input(model.input_shape[1:], name=model.input_names[0])

    towers = []
    for g in range(n_gpus):
        with tf.device('/gpu:' + str(g)):
            slice_g = Lambda(slice_batch, 
                             lambda shape: shape, 
                             arguments={'n_gpus':n_gpus, 'part':g})(x)
            towers.append(model(slice_g))

    with tf.device('/cpu:0'):
        merged = Concatenate(axis=0)(towers)

    return Model(inputs=[x], outputs=[merged])

Data parallelism in Keras

Answers (1)

Related Questions