The CNN model does not learn when adding one/two more convolutional layers

Question

I am trying to implement the model in the picture in tensorflow. Just instead of 6 output neurons I have 1000. I am doing this to have pretrained weights.

I implemented the full model but with only one (14,14,128) layer; just for testing and so on. Now that the porgram is matured I implemented the two more layers (or one). This makes the model not learning anything; loss is constant (around a small noise) and the accuracy tested on the training images is constant at random guess. Befor adding the layers I could get very fast (5-10 min) to accuracy of 70-80 percent in 1000 image-subset from the train dataset. As said this is not the case with the additional layers.

Here is the code for the additional layer where s1 and s2 is the stride of the conv:

w2 = weight_variable([3,3,64,128])
b2 = bias_variable([128])
h2 = tf.nn.relu(conv2d_s2(h1_pool,w2)+b2)
h2_pool = max_pool_2x2(h2)

#Starts additional layer
w3 = weight_variable([3,3,128,128])
b3 = bias_variable([128])
h3 = tf.nn.relu(conv2d_s1(h2_pool,w3)+b3)
#Ends additional layer

w5 = weight_variable([3,3,128,256])
b5 = bias_variable([256])
h5 = tf.nn.relu(conv2d_s1(h3,w5)+b5)
h5_pool = max_pool_2x2(h5)

This extra layer makes the model worthless. I have tried different hyperparameters (learning rate, batch size, epochs) without success. Where lies the problem?

Another question could be: does anyone know a small (and/or better) network of this size so I can imprement and test. My goal is to detect grasp positions in different objects (images of objects)?

If it helps, I use one GTX 980 with a very very good xeon.

The repository can be found in https://github.com/tnikolla/grasp-detection.

UPDATE

The problem was divergence in loss. Solved by lowering the learning rate

In orange is the accuracy and loss (tensorboard and terminal) when the program showed no learning at all. I was fooled by the loss showed in termminal. As pointed by @hars to check the logs of accuracy and loss I discovered that the loss in tensorboard diverges in the first steps. By changing the learning rate from 0,01 to 0,001 the divergence dissapeared and as you can see in cyan the model was learning (overfited a small subset of imagente in 1 minute).

Harsha Pokkalla · Accepted Answer

You have a ReLU layer at the end of the model which might be clipping all gradients, followed by a softmax with logits in the training part. Therefore, the model might get stuck at poor local-minima.

Try removing tf.nn.relu in the last line of your inference and see if it trains well.

Here is your part of the code:

last few lines of model :

# fc1 layer
W_fc3 = weight_variable([512, 1000])
b_fc3 = bias_variable([1000])
output = tf.nn.relu(tf.matmul(h_fc2, W_fc3) + b_fc3)
#print("output: {}".format(output.get_shape()))

return output

training part of the code:

logits = inference_redmon.inference(images)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
        logits=logits, labels=labels_one))        
tf.summary.scalar('loss', loss)
correct_pred = tf.equal( tf.argmax(logits,1), tf.argmax(labels_one,1))
accuracy = tf.reduce_mean( tf.cast( correct_pred, tf.float32))

The CNN model does not learn when adding one/two more convolutional layers

Answers (1)

Related Questions