Tensorflow ReLu doesn't work?

Question

I have written a convolutional network in tensorflow with relu as an activation function, however it is not learning (loss is constant for both eval and train data set). For different activation functions everything works as it should.

Here is code where the nn is created:

def _create_nn(self):
    current = tf.layers.conv2d(self.input, 20, 3, activation=self.activation)
    current = tf.layers.max_pooling2d(current, 2, 2)
    current = tf.layers.conv2d(current, 24, 3, activation=self.activation)
    current = tf.layers.conv2d(current, 24, 3, activation=self.activation)
    current = tf.layers.max_pooling2d(current, 2, 2)
    self.descriptor = current = tf.layers.conv2d(current, 32, 5, activation=self.activation)
    if not self.drop_conv:
        current = tf.layers.conv2d(current, self.layer_7_filters_n, 3, activation=self.activation)
    if self.add_conv:
        current = tf.layers.conv2d(current, 48, 2, activation=self.activation)

    self.descriptor = current

    last_conv_output_shape = current.get_shape().as_list()
    self.descr_size = last_conv_output_shape[1] * last_conv_output_shape[2] * last_conv_output_shape[3]

    current = tf.layers.dense(tf.reshape(current, [-1, self.descr_size]), 100, activation=self.activation)
    current = tf.layers.dense(current, 50, activation=self.last_activation)

    return current

self.activiation is set to tf.nn.relu and self.last_activiation is set to tf.nn.softmax

loss function and optimizer are created here:

    self._nn = self._create_nn()

    self._loss_function = tf.reduce_sum(tf.squared_difference(self._nn, self.Y), 1)

    optimizer = tf.train.AdamOptimizer()
    self._train_op = optimizer.minimize(self._loss_function)

I tried changing variables initialization by passing tf.random_normal_initializer(0.1, 0.1) as initializers however it did not result in any change in loss function.

I would be grateful for help in making this neural network work with ReLu.

Edit: Leaky ReLu has the same problem

Edit: Small example where I managed to duplicate same error:

x = tf.constant([[3., 211., 123., 78.]])
v = tf.Variable([0.5, 0.5, 0.5, 0.5])
h_d = tf.layers.Dense(4, activation=tf.nn.leaky_relu)
h = h_d(x)
y_d = tf.layers.Dense(4, activation=tf.nn.softmax)
y = y_d(h)
d = tf.constant([[.5, .5, 0, 0]])

Gradients (as calculated with tf.gradients) for h_d and y_d kernels and biases are either equal or close to 0

J. Łyskawa · Accepted Answer

Looks like the problem was with the scale of input data. With values being between 0 and 255 that scale was more or less kept in the next layers, giving pre-activation outputs of the last layer having large enough differences to decrease softmax gradient to (almost) 0. It was observable only with relu-like activation functions because other, like sigmoid or softsign, kept values ranges in network smaller, with an order of magnitude of 1 inststead of tens or hundreds.

The solution here was to just multiply input to rescale it to 0-1, in case of bytes by 1/255.

Tensorflow ReLu doesn't work?

Answers (2)

Related Questions

Tensorflow ReLu doesn&#39;t work?

Answers (2)

Related Questions

Tensorflow ReLu doesn't work?