Reputation: 72
I am trying to implement a Fully Convolutional network (5 layers) in TensorFlow. But after few time of training, all my logits fall to 0. Did anyone have the same problem before ?
Here is how I implemented my CONV-ReLU-maxPOOL layer :
def conv_relu_layer (in_data, nb_filters, filter_shape) :
nb_in_channels = int (in_data_reshaped.shape[3])
conv_shape = [filter_shape[0], filter_shape[1],
nb_in_channels, nb_filters]
weights = tf.Variable (
tf.truncated_normal (conv_shape, mean=0., stddev=.05))
bias = tf.Variable (
tf.truncated_normal ([nb_filters], mean=0., stddev=1.))
output = tf.nn.conv2d (in_data_reshaped, weights,
[1,1,1,1], padding="SAME")
output += bias
output = tf.nn.relu (output)
return output
def conv_relu_pool_layer (in_data, nb_filters, filter_shape, pool_shape,
pooling=tf.nn.max_pool) :
conv_out = conv_relu_layer (in_data, nb_filters, filter_shape)
ksize = [1, pool_shape[0], pool_shape[1], 1]
strides = [1, pool_shape[0], pool_shape[1], 1]
return pooling (conv_out, ksize=ksize, strides=strides, padding="SAME")
Here is my network :
def create_network_5C (in_data, name="5C") :
c1 = conv_relu_pool_layer (in_data, 64, [5,5], [2,2])
c2 = conv_relu_pool_layer (c1, 128, [5,5], [2,2])
c3 = conv_relu_pool_layer (c2, 256, [5,5], [2,2])
c4 = conv_relu_pool_layer (c3, 64, [5,5], [2,2])
return conv_relu_layer (c4, 2, [5,5])
The loss function :
def loss (logits, labels, num_classes) :
with tf.name_scope('loss'):
logits = tf.reshape(logits, (-1, num_classes))
epsilon = tf.constant(value=1e-4)
labels = tf.to_float(tf.reshape(labels, (-1, num_classes)))
softmax = tf.nn.softmax(logits) + epsilon
cross_entropy = - tf.reduce_sum (
tf.multiply (labels * tf.log (softmax), head),
reduction_indices=[1])
cross_entropy_mean = tf.reduce_mean (cross_entropy)
tf.add_to_collection('losses', cross_entropy_mean)
loss = tf.add_n(tf.get_collection('losses'))
return loss
My main routine :
batch_size = 5
# Load data
x = tf.placeholder (tf.float32, [None, 416, 416, 3], name="x")
y = tf.placeholder (tf.float32, [None, 416, 416, 1], name="y")
# Contrast normalization and computation
x_gcn = tf.map_fn (lambda img : tf.image.per_image_standardization (img), x)
logits = create_network_5C (x_gcn)
# Having label at the same dimension as the output
y_p = tf.nn.avg_pool (tf.sign (y),
ksize=[1,16,16,1], strides=[1,16,16,1], padding="SAME")
y_rshp = tf.reshape (y_p, [batch_size, 416//16, 416//16])
y_bin = tf.cast (y_rshp > .5, tf.int32)
y_1hot = tf.one_hot (y_bin, 2)
# Compute error
error = loss (logits, y_1hot, 2)
optimizer = tf.train.AdamOptimizer (learning_rate=args.eta).minimize (error)
# Run the session
init_op = tf.global_variables_initializer ()
with tf.Session () as session :
session.run (init_op)
err, _ = session.run ([error, optimizer],
feed_dict={ x: image_batch,
y: label_batch })
I note that, if I reduce my network to 2 layers only, it won't drop the logits to 0, but it won't learn anything either. If I reduce it to 3 layers, it will drop to 0, but after a many iterations (while 5 layers drop to 0 in few batches).
Can this be linked to what is called "gradient vanish" ?
If it's relevant, my spec are : Ubuntu 16.04 - Python 3.6.4 - tensorflow 1.6.0
[EDIT] My problem really look like dead-ReLU, as mentioned here : StackOverflow : FCN training error, but my data is normalized (between something like -2 and +2, and I already tried to change the mean and stddev initial value of my weights and biases
[EDIT 2] I tried to replace the ReLUs with Leaky ReLU, or a softplus, in both cases, logits get stucked under 0.1 and loss stay between 0.6 and 0.7
Upvotes: 0
Views: 118
Reputation: 72
Using some leaky relu was actually enough, then I just needed to let him train for a hudge amount of time.
Upvotes: 0