r3t2
r3t2

Reputation: 85

logging weights and biases in caffe

I am working on project that requires identifying facial features given a person's face. I formulated this as a regression problem and want to start with a simple conv network and defined the network below.

I noticed the output predicted was always the same and some more debug later, I see that the weights and gradients of score layer do not change over iterations. I am using a fixed learning rate of ~5e-2 to generate the example below. The training loss seems decrease as iterations progress but I am unable to understand why. I also logged other layers: 'conv1', 'conv2', 'fc1' and see the same behavior of remaining constant over iterations. Since the loss seems to decrease, something must be changing and my guess is that logging the way I am doing below may not be correct.

Could you please give me some pointers to check? Please let me know if you need more information

Modified lenet:

# Modified LeNet. Added relu1, relu2 and, dropout. 
# Loss function is an Euclidean distance
def lenet(hdf5_list, batch_size=64, dropout_ratio=0.5, train=True):
    # our version of LeNet: a series of linear and simple nonlinear transformations
    n = caffe.NetSpec()

    n.data, n.label = L.HDF5Data(batch_size=batch_size, source=hdf5_list, ntop=2)

    n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'), bias_filler=dict(type='constant', value=0.1))
    n.relu1 = L.ReLU(n.conv1, in_place=False, relu_param=dict(negative_slope=0.1))
    n.pool1 = L.Pooling(n.relu1, kernel_size=2, stride=2, pool=P.Pooling.MAX)

    n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'), bias_filler=dict(type='constant', value=0.1))
    n.relu2 = L.ReLU(n.conv2, in_place=False, relu_param=dict(negative_slope=0.1))
    n.pool2 = L.Pooling(n.relu2, kernel_size=2, stride=2, pool=P.Pooling.MAX)

    if train:
        n.drop3 = fc1_input = L.Dropout(n.pool2, in_place=True, dropout_param = dict(dropout_ratio=dropout_ratio) )
    else:
        fc1_input = n.pool2

    n.fc1 =   L.InnerProduct(fc1_input, num_output=500, weight_filler=dict(type='xavier'), bias_filler=dict(type='constant', value=0.1))
    n.relu3 = L.ReLU(n.fc1, in_place=True, relu_param=dict(negative_slope=0.1))
    n.score = L.InnerProduct(n.relu3, num_output=30, weight_filler=dict(type='xavier'))
    n.loss =  L.EuclideanLoss(n.score, n.label)

    return n.to_proto()

solver loop:

#custom solver loop
for it in range(niter):
    solver.step(1)

    train_loss[it] = solver.net.blobs['loss'].data

    score_weights.append(solver.net.params['score'][0].data)
    score_biases.append(solver.net.params['score'][1].data)
    score_weights_diff.append(solver.net.params['score'][0].diff)
    score_biases_diff.append(solver.net.params['score'][1].diff)


    if (it % val_interval) == 0 or (it == niter - 1):

        val_error_this = 0
        for test_it in range(niter_val_error):
            solver.test_nets[0].forward()
            val_error_this += euclidean_loss(solver.test_nets[0].blobs['score'].data , 
                                             solver.test_nets[0].blobs['label'].data) / niter_val_error
        val_error[it // val_interval] = val_error_this

printing the scores:

print score_weights_diff[0].shape
for i in range(10):
    score_weights_i = score_weights_diff[i]
    print score_weights_i[0:30:10,0]


print score_biases_diff[0].shape
for i in range(5):
    score_biases_i = score_biases_diff[i]
    print score_biases_i[0:30:6]

output:

(30, 500)
[ -3.71852257e-05   7.34565838e-05   2.61445384e-04]
[ -3.71852257e-05   7.34565838e-05   2.61445384e-04]
[ -3.71852257e-05   7.34565838e-05   2.61445384e-04]
[ -3.71852257e-05   7.34565838e-05   2.61445384e-04]
[ -3.71852257e-05   7.34565838e-05   2.61445384e-04]
[ -3.71852257e-05   7.34565838e-05   2.61445384e-04]
[ -3.71852257e-05   7.34565838e-05   2.61445384e-04]
[ -3.71852257e-05   7.34565838e-05   2.61445384e-04]
[ -3.71852257e-05   7.34565838e-05   2.61445384e-04]
[ -3.71852257e-05   7.34565838e-05   2.61445384e-04]
131
(30,)
[  3.22921231e-04   5.66378840e-05  -5.15143370e-07  -1.51118627e-04
   2.30352176e-04]
[  3.22921231e-04   5.66378840e-05  -5.15143370e-07  -1.51118627e-04
   2.30352176e-04]
[  3.22921231e-04   5.66378840e-05  -5.15143370e-07  -1.51118627e-04
   2.30352176e-04]
[  3.22921231e-04   5.66378840e-05  -5.15143370e-07  -1.51118627e-04
   2.30352176e-04]
[  3.22921231e-04   5.66378840e-05  -5.15143370e-07  -1.51118627e-04
   2.30352176e-04]

Upvotes: 1

Views: 380

Answers (1)

Shai
Shai

Reputation: 114786

It's a bit difficult to see from your code, but it is possible that score_weights_diff, score_biases_diff and the other lists are storing references to solver.net.params['score'][0].diff and therefore all entries in the list are actually the same and change together at each iteration.

  1. Try and save a copy:

    score_weights_diff.append(solver.net.params['score'][0].diff[...].copy())
    
  2. Try and print the weights/biases after each iteration and see if they change.

Upvotes: 1

Related Questions