Weights of my TensorFlow FCN all drop to 0

Question

I am trying to implement a Fully Convolutional network (5 layers) in TensorFlow. But after few time of training, all my logits fall to 0. Did anyone have the same problem before ?

Here is how I implemented my CONV-ReLU-maxPOOL layer :

def conv_relu_layer (in_data, nb_filters, filter_shape) :
    nb_in_channels = int (in_data_reshaped.shape[3])
    conv_shape = [filter_shape[0], filter_shape[1], 
                  nb_in_channels, nb_filters]

    weights = tf.Variable (
        tf.truncated_normal (conv_shape, mean=0., stddev=.05))
    bias    = tf.Variable (
        tf.truncated_normal ([nb_filters], mean=0., stddev=1.))

    output = tf.nn.conv2d (in_data_reshaped, weights,
                           [1,1,1,1], padding="SAME")
    output += bias
    output = tf.nn.relu (output)
    return output

def conv_relu_pool_layer (in_data, nb_filters, filter_shape, pool_shape,
                          pooling=tf.nn.max_pool) :
    conv_out = conv_relu_layer (in_data, nb_filters, filter_shape)
    ksize   = [1, pool_shape[0], pool_shape[1], 1] 
    strides = [1, pool_shape[0], pool_shape[1], 1]
    return pooling (conv_out, ksize=ksize, strides=strides, padding="SAME")

Here is my network :

def create_network_5C (in_data, name="5C") :
    c1 = conv_relu_pool_layer (in_data, 64, [5,5], [2,2])
    c2 = conv_relu_pool_layer (c1,     128, [5,5], [2,2])
    c3 = conv_relu_pool_layer (c2,     256, [5,5], [2,2])
    c4 = conv_relu_pool_layer (c3,      64, [5,5], [2,2])
    return conv_relu_layer    (c4,       2, [5,5])

The loss function :

def loss (logits, labels, num_classes) :
    with tf.name_scope('loss'):
        logits = tf.reshape(logits, (-1, num_classes))
        epsilon = tf.constant(value=1e-4)
        labels = tf.to_float(tf.reshape(labels, (-1, num_classes)))

        softmax = tf.nn.softmax(logits) + epsilon

        cross_entropy = - tf.reduce_sum (
            tf.multiply (labels * tf.log (softmax), head),
            reduction_indices=[1])

        cross_entropy_mean = tf.reduce_mean (cross_entropy)
        tf.add_to_collection('losses', cross_entropy_mean)

        loss = tf.add_n(tf.get_collection('losses'))
    return loss

My main routine :

batch_size = 5
# Load data
x = tf.placeholder (tf.float32, [None, 416, 416, 3], name="x")
y = tf.placeholder (tf.float32, [None, 416, 416, 1], name="y")

# Contrast normalization and computation
x_gcn = tf.map_fn (lambda img : tf.image.per_image_standardization (img), x)
logits = create_network_5C (x_gcn)

# Having label at the same dimension as the output
y_p = tf.nn.avg_pool (tf.sign (y),
                      ksize=[1,16,16,1], strides=[1,16,16,1], padding="SAME")
y_rshp = tf.reshape (y_p, [batch_size, 416//16, 416//16])
y_bin = tf.cast (y_rshp > .5, tf.int32)
y_1hot = tf.one_hot (y_bin, 2)

# Compute error
error = loss (logits, y_1hot, 2)
optimizer = tf.train.AdamOptimizer (learning_rate=args.eta).minimize (error)

# Run the session
init_op = tf.global_variables_initializer ()
with tf.Session () as session :
    session.run (init_op)
    err, _ = session.run ([error, optimizer],
                           feed_dict={ x: image_batch,
                                       y: label_batch })

I note that, if I reduce my network to 2 layers only, it won't drop the logits to 0, but it won't learn anything either. If I reduce it to 3 layers, it will drop to 0, but after a many iterations (while 5 layers drop to 0 in few batches).

Can this be linked to what is called "gradient vanish" ?

If it's relevant, my spec are : Ubuntu 16.04 - Python 3.6.4 - tensorflow 1.6.0

[EDIT] My problem really look like dead-ReLU, as mentioned here : StackOverflow : FCN training error, but my data is normalized (between something like -2 and +2, and I already tried to change the mean and stddev initial value of my weights and biases

[EDIT 2] I tried to replace the ReLUs with Leaky ReLU, or a softplus, in both cases, logits get stucked under 0.1 and loss stay between 0.6 and 0.7

Motiss · Accepted Answer

Using some leaky relu was actually enough, then I just needed to let him train for a hudge amount of time.

Weights of my TensorFlow FCN all drop to 0

Answers (1)

Related Questions