Reputation: 2602
I'm trying to take weights from a very simple Caffe model and interpret it to fully functional Keras model.
This is the original definition of model in Caffe, let's call it simple.prototxt
:
input: "im_data"
input_shape {
dim: 1
dim: 3
dim: 1280
dim: 1280
}
layer {
name: "conv1"
type: "Convolution"
bottom: "im_data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
pad: 5
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
pad: 0
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
kernel_size: 5
pad: 2
group: 2
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
The layer definition in Caffe might look complex, but it just takes an image of dimensions 1280x1280x3
passes it to convolutional layer, then max pools it and passes it to the final convolutional layer.
Here is its implementation in Keras which is much more simple:
from keras.models import Model
from keras.layers import Input, BatchNormalization,
from keras.activations import relu, softmax
im_data = Input(shape=(1280, 1280, 3),
dtype='float32',
name='im_data')
conv1 = Conv2D(filters=96,
kernel_size=11,
strides=(4, 4),
activation=relu,
padding='same',
name='conv1')(im_data)
pooling1 = MaxPooling2D(pool_size=(3, 3),
strides=(2, 2),
padding='same',
name='pooling1')(conv1)
normalized1 = BatchNormalization()(pooling1) # https://stats.stackexchange.com/questions/145768/importance-of-local-response-normalization-in-cnn
conv2 = Conv2D(filters=256,
kernel_size=5,
activation=relu,
padding='same',
name='conv2')(normalized1)
model = Model(inputs=[im_data], outputs=conv2)
Although both models seem to have similar parameters in each layers, but the problem is that their weight shapes are not equal. I am aware that Caffe has different shape order from Keras, but ordering is not the concern here.
Problem is that last convolution layer of Keras has different value in 3rd dimension compared to the last convolution layer in Caffe. See below.
Weight shapes for Caffe:
>>> net = caffe.net('simple.prototxt', 'premade_weights.caffemodel', caffe.TEST)
>>> for i in range(len(net.layers)):
... if len(net.layers[i].blobs) != 0: # if layer has no weights
... print(("name", net._layer_names[i]))
... print("weight_shapes", [v.data.shape for v in net.layers[i].blobs])
('name', 'conv1')
('weight_shapes', [(96, 3, 11, 11), (96,)])
('name', 'conv2')
('weight_shapes', [(256, 48, 5, 5), (256,)])
Weight shapes for Keras:
>>> for layer in model.layers:
... if len(layer.get_weights()) != 0:
... print(("name", layer.name))
... print(("weight_shapes", [w.shape for w in layer.get_weights()]))
('name', 'conv1')
('weight_shapes', [(11, 11, 3, 96), (96,)])
('name', 'conv2')
('weight_shapes', [(5, 5, 96, 256), (256,)])
This seems to be weird behavior. As you see, conv1
shape in Caffe and Keras are equal (ignoring the order). But in Caffe conv2
shape is [(256, 48, 5, 5), (256,)])
, whereas in Keras 'conv2' shape is [(5, 5, 96, 256), (256,)]
, notice, that 48*2=96
.
Also, notice that conv2
layer is directly after the max pooling layer, so there might be something wrong with the max pooling layer in Keras.
Did I correctly interpret model definition from Caffe to Keras? Especially the max pooling layer and its parameters?
Thank you very much!
Upvotes: 4
Views: 1935
Reputation: 5064
Pay attention to the group: 2
field in your conv2
definition. That means you got a grouped convolution there (Caffe: What does the group param mean?). Technically that means that you have two filters, each of shape (128, 48, 5, 5)
. The first one will convolve with the first 48 channels and produce the first 128 outputs, the second one is for the remaining ones. However, Caffe stores the two weights in a single blob, that's why it's shape is (128x2, 48, 5, 5)
There is no such param in Keras Conv2D
layer, but the widely adopted workaround is to split the input feature map with Lambda
layers, process them with the two distinct convolutional layers and then merge back to a single feature map.
from keras.layers import Concatenate
normalized1_1 = Lambda(lambda x: x[:, :, :, :48])(normalized1)
normalized1_2 = Lambda(lambda x: x[:, :, :, 48:])(normalized1)
conv2_1 = Conv2D(filters=128,
kernel_size=5,
activation=relu,
padding='same',
name='conv2_1')(normalized1_1)
conv2_2 = Conv2D(filters=128,
kernel_size=5,
activation=relu,
padding='same',
name='conv2_2')(normalized1_2)
conv2 = Concatenate(name='conv_2_merge')([conv2_1, conv2_2])
I did not check the code for correctness, but the idea must be something like this.
Concerning your task: Converting networks from Caffe to Keras might be tricky. To get absolutely the same result, you must encounter a lot of subtle things like asymmetric padding in convolutions or different max-pooling behavior. That is why if you import the weights from Caffe, you probably cannot replace the LRN layer with batchnorm. Fortunately, there are implementations of LRN in Keras, for instance here.
Upvotes: 6