Reputation: 191
I'm trying to reimplement FCN on tensorflow. I have implemented the deconvolution layer as such.
up8_filter = tf.Variable(tf.truncated_normal([64, 64, 21, 21]))
prob_32 = tf.nn.conv2d_transpose(score, up8_filter, output_shape = [batch_size, 224, 224, 21], strides = [1, 32, 32, 1])
tf.histogram_summary('fc8_filter', up8_filter)
Training looks fine with the loss value dropping until it become Nan
. I checked tensorboard and it suggest that up8_filter
seems to diverged.
Is there a way to regularize the weight value in Tensorflow ?
I have tried following methods
I did not pad image to 100 pixel as per FCN implementation since Tensorflow conv2d
does not support it. I converted VGG weight using caffe-tensorflow, there is not much I can do to alter it's network structure.
I'm sorry for the confusing question, there is so many thing to go wrong and I'm not sure where to start.
Snippet for the loss value.
Step 1: loss = 732171599872.00
Step 10: loss = 391914520576.00
Step 20: loss = 32141299712.00
Step 30: loss = 1255705344.00
[Update]:
Loss Function loss32
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
tf.reshape(prob_32, [batch_size*224*224, 21]),
tf.reshape(lbl_ph, [-1]) ))
[Update2]
I followed suggest by ziky90 and it worked. The training now converged and deconv filter seems to stop diverge. I will report agian for the accuracy.
Upvotes: 3
Views: 7556
Reputation: 3506
Also have a look at my Tensorflow FCN implementation. Training works when using this loss function in combination with this training script.
Here are some insights I gained when I implemented FCN.
tf.nn.sparse_softmax_cross_entropy_with_logits
can be used but it causes numerical instabilities in some cases. See also this Tensorflow issue. I therefore decided to implement cross entropy using tensor operations.softmax batches
) reducing the training rate is useful. Adam optimizer in combination with a learning rate of 1e-6
seems to be useful.Upvotes: 5
Reputation: 2697
If I compare this with the reference caffe implementation, then I see, that you are not initialising weights by bilinear interpolation in the deconvolution
/tf.nn.conv2d_transpose
layer, but by tf.truncated_normal
.
You can have look on the reference implementation in caffe here and it is called from here
Upvotes: 3