Anoop K. Prabhu
Anoop K. Prabhu

Reputation: 5635

Interpreting Caffe models

I am trying to interpret and understand the models that are written in Caffe .proto.

Yesterday I came across a sample 'deploy.prototxt' by Shai in here, as quoted below:

layer {
   name: "ip1_a"
   bottom: "data_a"
   top: "ip1_a"
   type: "InnerProduct"
   inner_product_param {
     num_output: 10
   }
   param {
     name: "ip1_w"  # NOTE THIS NAME!
     lr_mult: 1
   }
   param {
     name: "ip1_b"
     lr_mult: 2
   }
 }
 layer {
   name: "ip1_b"
   bottom: "data_b"
   top: "ip1_b"
   type: "InnerProduct"
   inner_product_param {
     num_output: 10
   }
   param {
     name: "ip1_w"  # NOTE THIS NAME: it's the same!
     lr_mult: 10 # different LR for this branch
   }
   param {
     name: "ip1_b"
     lr_mult: 20
   }
 }
 # one layer to combine them     
 layer {
   type: "Concat"
   bottom: "ip1_a"
   bottom: "ip1_b"
   top: "ip1_combine"
   name: "concat"
 }
 layer {
   name: "joint_ip"
   type: "InnerProduct"
   bottom: "ip1_combine"
   top: "joint_ip"
   inner_product_param {
     num_output: 30
   }
 } 

I understand this model definition as:

     data_a         data_b
        |             |
        |             |
     -------       -------   
    | ip1_a |     | ip1_b |
     -------       -------
        |             |
        |             |
      ip1_a         ip1_b
        |             |
        |             |
        V             V
        ~~~~~~~~~~~~~~~
               |
               |
               V
         ------------- 
        |    concat   |
         ------------- 
               |
               |
         ip1_combine
               |
               |
         ------------- 
        |   joint_ip  |
         ------------- 
               |
               |
            joint_ip   

blob ip1_a is trained by layer ip1_a, with weights initialized with ip1_w(lr:1) and bias initialized with ip1_b(lr:2). blob ip1_a is actually the new learned weights which was initialized with ip1_w. The learned bias doesn't have a name.

In some models, we can find some layers have:

lr_mult:1
lr_mult:2

Where the first instance of lr_mult always correspond to weights and the next instance for bias.

Are my above understandings correct?

Upvotes: 1

Views: 353

Answers (1)

Shai
Shai

Reputation: 114786

You are mixing two data types: the input (training) data and the net's parameters.
During training the input data is fixed to a known training/validation set and only the net parameters are changed. In contrast, when deploying the net, the data changes to new images while the net parameters are fixed. See this answer for some in-depth description of the way caffe stores these two types of data.

In the example you showed, there are two input training data paths: data_a and data_b that might be different images each time. The input blobs pass through an InnerProduct layer to become ip1_a and ip1_b blobs respectively. Then they are concatenated into a single blob ip1_combined which in turn is fed into the final InnerProduct layer.

On the other hand, the model have a set of parameters: ip1_w and ip1_b (the weights and bias) of the first inner product layer. In this particular examples the parameters of the layer were explicitly named to indicate the fact that they are shared between ip1_a and ip1_b layers.

As for the two lr_mult: then yes, the first is the LR multiplier of the weights, and the second one is for the bias term.

Upvotes: 1

Related Questions