24hours
24hours

Reputation: 191

Fully Convolution Net (FCN) on Tensorflow

I'm trying to reimplement FCN on tensorflow. I have implemented the deconvolution layer as such.

up8_filter = tf.Variable(tf.truncated_normal([64, 64, 21, 21]))
prob_32 = tf.nn.conv2d_transpose(score, up8_filter, output_shape = [batch_size, 224, 224, 21], strides = [1, 32, 32, 1])
tf.histogram_summary('fc8_filter', up8_filter)

Training looks fine with the loss value dropping until it become Nan. I checked tensorboard and it suggest that up8_filter seems to diverged.

enter image description here

Is there a way to regularize the weight value in Tensorflow ?
I have tried following methods

  1. Lower learning rate
  2. Zero-mean image

I did not pad image to 100 pixel as per FCN implementation since Tensorflow conv2d does not support it. I converted VGG weight using caffe-tensorflow, there is not much I can do to alter it's network structure.

I'm sorry for the confusing question, there is so many thing to go wrong and I'm not sure where to start.

Snippet for the loss value.

Step 1: loss = 732171599872.00
Step 10: loss = 391914520576.00
Step 20: loss = 32141299712.00
Step 30: loss = 1255705344.00

[Update]:

Loss Function loss32

 loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
                                    tf.reshape(prob_32, [batch_size*224*224, 21]),
                                    tf.reshape(lbl_ph, [-1]) ))

[Update2]

I followed suggest by ziky90 and it worked. The training now converged and deconv filter seems to stop diverge. I will report agian for the accuracy.

Upvotes: 3

Views: 7556

Answers (2)

MarvMind
MarvMind

Reputation: 3506

Also have a look at my Tensorflow FCN implementation. Training works when using this loss function in combination with this training script.

Here are some insights I gained when I implemented FCN.

  1. The deconv filter need to be initialized bilinear.
  2. tf.nn.sparse_softmax_cross_entropy_with_logits can be used but it causes numerical instabilities in some cases. See also this Tensorflow issue. I therefore decided to implement cross entropy using tensor operations.
  3. When using large images (which results in large softmax batches) reducing the training rate is useful. Adam optimizer in combination with a learning rate of 1e-6 seems to be useful.

Upvotes: 5

ziky90
ziky90

Reputation: 2697

If I compare this with the reference caffe implementation, then I see, that you are not initialising weights by bilinear interpolation in the deconvolution/tf.nn.conv2d_transpose layer, but by tf.truncated_normal.

You can have look on the reference implementation in caffe here and it is called from here

Upvotes: 3

Related Questions