Caffe to Keras conversion of grouped convolution

Question

I'm trying to take weights from a very simple Caffe model and interpret it to fully functional Keras model.

This is the original definition of model in Caffe, let's call it simple.prototxt:

input: "im_data"
input_shape {
  dim: 1
  dim: 3
  dim: 1280
  dim: 1280
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "im_data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    pad: 5
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    pad: 0
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    kernel_size: 5
    pad: 2
    group: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}

The layer definition in Caffe might look complex, but it just takes an image of dimensions 1280x1280x3 passes it to convolutional layer, then max pools it and passes it to the final convolutional layer.

Here is its implementation in Keras which is much more simple:

from keras.models import Model
from keras.layers import Input, BatchNormalization, 
from keras.activations import relu, softmax

im_data = Input(shape=(1280, 1280, 3),
                   dtype='float32',
                   name='im_data')
conv1 = Conv2D(filters=96,
               kernel_size=11,
               strides=(4, 4),
               activation=relu,
               padding='same',
               name='conv1')(im_data)

pooling1 = MaxPooling2D(pool_size=(3, 3),
                        strides=(2, 2),
                        padding='same',
                        name='pooling1')(conv1)
normalized1 = BatchNormalization()(pooling1)  # https://stats.stackexchange.com/questions/145768/importance-of-local-response-normalization-in-cnn

conv2 = Conv2D(filters=256,
               kernel_size=5,
               activation=relu,
               padding='same',
               name='conv2')(normalized1)
model = Model(inputs=[im_data], outputs=conv2)

Problem:

Although both models seem to have similar parameters in each layers, but the problem is that their weight shapes are not equal. I am aware that Caffe has different shape order from Keras, but ordering is not the concern here.

Problem is that last convolution layer of Keras has different value in 3rd dimension compared to the last convolution layer in Caffe. See below.

Weight shapes for Caffe:

>>> net = caffe.net('simple.prototxt', 'premade_weights.caffemodel', caffe.TEST)
>>> for i in range(len(net.layers)):
...     if len(net.layers[i].blobs) != 0:  # if layer has no weights
...         print(("name", net._layer_names[i]))
...         print("weight_shapes", [v.data.shape for v in net.layers[i].blobs])
('name', 'conv1')
('weight_shapes', [(96, 3, 11, 11), (96,)])
('name', 'conv2')
('weight_shapes', [(256, 48, 5, 5), (256,)])

Weight shapes for Keras:

>>> for layer in model.layers:
...     if len(layer.get_weights()) != 0:
...         print(("name", layer.name))
...         print(("weight_shapes", [w.shape for w in layer.get_weights()]))  
('name', 'conv1')
('weight_shapes', [(11, 11, 3, 96), (96,)])
('name', 'conv2')
('weight_shapes', [(5, 5, 96, 256), (256,)])

This seems to be weird behavior. As you see, conv1 shape in Caffe and Keras are equal (ignoring the order). But in Caffe conv2 shape is [(256, 48, 5, 5), (256,)]), whereas in Keras 'conv2' shape is [(5, 5, 96, 256), (256,)], notice, that 48*2=96.

Also, notice that conv2 layer is directly after the max pooling layer, so there might be something wrong with the max pooling layer in Keras.

Question:

Did I correctly interpret model definition from Caffe to Keras? Especially the max pooling layer and its parameters?

Thank you very much!

Dmytro Prylipko · Accepted Answer

Pay attention to the group: 2 field in your conv2 definition. That means you got a grouped convolution there (Caffe: What does the group param mean?). Technically that means that you have two filters, each of shape (128, 48, 5, 5). The first one will convolve with the first 48 channels and produce the first 128 outputs, the second one is for the remaining ones. However, Caffe stores the two weights in a single blob, that's why it's shape is (128x2, 48, 5, 5)

There is no such param in Keras Conv2D layer, but the widely adopted workaround is to split the input feature map with Lambda layers, process them with the two distinct convolutional layers and then merge back to a single feature map.

from keras.layers import Concatenate

normalized1_1 = Lambda(lambda x: x[:, :, :, :48])(normalized1)
normalized1_2 = Lambda(lambda x: x[:, :, :, 48:])(normalized1)

conv2_1 = Conv2D(filters=128,
                 kernel_size=5,
                 activation=relu,
                 padding='same',
                 name='conv2_1')(normalized1_1)

conv2_2 = Conv2D(filters=128,
                 kernel_size=5,
                 activation=relu,
                 padding='same',
                 name='conv2_2')(normalized1_2)

conv2 = Concatenate(name='conv_2_merge')([conv2_1, conv2_2])

I did not check the code for correctness, but the idea must be something like this.

Concerning your task: Converting networks from Caffe to Keras might be tricky. To get absolutely the same result, you must encounter a lot of subtle things like asymmetric padding in convolutions or different max-pooling behavior. That is why if you import the weights from Caffe, you probably cannot replace the LRN layer with batchnorm. Fortunately, there are implementations of LRN in Keras, for instance here.

Caffe to Keras conversion of grouped convolution

Problem:

Question:

Answers (1)

Related Questions