Reputation: 5635
I am trying to interpret and understand the models that are written in Caffe .proto.
Yesterday I came across a sample 'deploy.prototxt'
by Shai in here, as quoted below:
layer {
name: "ip1_a"
bottom: "data_a"
top: "ip1_a"
type: "InnerProduct"
inner_product_param {
num_output: 10
}
param {
name: "ip1_w" # NOTE THIS NAME!
lr_mult: 1
}
param {
name: "ip1_b"
lr_mult: 2
}
}
layer {
name: "ip1_b"
bottom: "data_b"
top: "ip1_b"
type: "InnerProduct"
inner_product_param {
num_output: 10
}
param {
name: "ip1_w" # NOTE THIS NAME: it's the same!
lr_mult: 10 # different LR for this branch
}
param {
name: "ip1_b"
lr_mult: 20
}
}
# one layer to combine them
layer {
type: "Concat"
bottom: "ip1_a"
bottom: "ip1_b"
top: "ip1_combine"
name: "concat"
}
layer {
name: "joint_ip"
type: "InnerProduct"
bottom: "ip1_combine"
top: "joint_ip"
inner_product_param {
num_output: 30
}
}
I understand this model definition as:
data_a data_b
| |
| |
------- -------
| ip1_a | | ip1_b |
------- -------
| |
| |
ip1_a ip1_b
| |
| |
V V
~~~~~~~~~~~~~~~
|
|
V
-------------
| concat |
-------------
|
|
ip1_combine
|
|
-------------
| joint_ip |
-------------
|
|
joint_ip
blob ip1_a
is trained by layer ip1_a
, with weights initialized with ip1_w
(lr:1) and bias initialized with ip1_b
(lr:2).
blob ip1_a
is actually the new learned weights which was initialized with ip1_w
. The learned bias doesn't have a name.
In some models, we can find some layers have:
lr_mult:1
lr_mult:2
Where the first instance of lr_mult
always correspond to weights and the next instance for bias.
Are my above understandings correct?
Upvotes: 1
Views: 353
Reputation: 114786
You are mixing two data types: the input (training) data and the net's parameters.
During training the input data is fixed to a known training/validation set and only the net parameters are changed. In contrast, when deploying the net, the data changes to new images while the net parameters are fixed. See this answer for some in-depth description of the way caffe stores these two types of data.
In the example you showed, there are two input training data paths: data_a
and data_b
that might be different images each time. The input blobs pass through an InnerProduct layer to become ip1_a
and ip1_b
blobs respectively. Then they are concatenated into a single blob ip1_combined
which in turn is fed into the final InnerProduct layer.
On the other hand, the model have a set of parameters: ip1_w
and ip1_b
(the weights and bias) of the first inner product layer. In this particular examples the parameters of the layer were explicitly named to indicate the fact that they are shared between ip1_a
and ip1_b
layers.
As for the two lr_mult
: then yes, the first is the LR multiplier of the weights, and the second one is for the bias term.
Upvotes: 1