Reputation: 321
I have written a convolutional network in tensorflow with relu as an activation function, however it is not learning (loss is constant for both eval and train data set). For different activation functions everything works as it should.
Here is code where the nn is created:
def _create_nn(self):
current = tf.layers.conv2d(self.input, 20, 3, activation=self.activation)
current = tf.layers.max_pooling2d(current, 2, 2)
current = tf.layers.conv2d(current, 24, 3, activation=self.activation)
current = tf.layers.conv2d(current, 24, 3, activation=self.activation)
current = tf.layers.max_pooling2d(current, 2, 2)
self.descriptor = current = tf.layers.conv2d(current, 32, 5, activation=self.activation)
if not self.drop_conv:
current = tf.layers.conv2d(current, self.layer_7_filters_n, 3, activation=self.activation)
if self.add_conv:
current = tf.layers.conv2d(current, 48, 2, activation=self.activation)
self.descriptor = current
last_conv_output_shape = current.get_shape().as_list()
self.descr_size = last_conv_output_shape[1] * last_conv_output_shape[2] * last_conv_output_shape[3]
current = tf.layers.dense(tf.reshape(current, [-1, self.descr_size]), 100, activation=self.activation)
current = tf.layers.dense(current, 50, activation=self.last_activation)
return current
self.activiation is set to tf.nn.relu and self.last_activiation is set to tf.nn.softmax
loss function and optimizer are created here:
self._nn = self._create_nn()
self._loss_function = tf.reduce_sum(tf.squared_difference(self._nn, self.Y), 1)
optimizer = tf.train.AdamOptimizer()
self._train_op = optimizer.minimize(self._loss_function)
I tried changing variables initialization by passing tf.random_normal_initializer(0.1, 0.1)
as initializers however it did not result in any change in loss function.
I would be grateful for help in making this neural network work with ReLu.
Edit: Leaky ReLu has the same problem
Edit: Small example where I managed to duplicate same error:
x = tf.constant([[3., 211., 123., 78.]])
v = tf.Variable([0.5, 0.5, 0.5, 0.5])
h_d = tf.layers.Dense(4, activation=tf.nn.leaky_relu)
h = h_d(x)
y_d = tf.layers.Dense(4, activation=tf.nn.softmax)
y = y_d(h)
d = tf.constant([[.5, .5, 0, 0]])
Gradients (as calculated with tf.gradients) for h_d and y_d kernels and biases are either equal or close to 0
Upvotes: 1
Views: 1421
Reputation: 321
Looks like the problem was with the scale of input data. With values being between 0 and 255 that scale was more or less kept in the next layers, giving pre-activation outputs of the last layer having large enough differences to decrease softmax gradient to (almost) 0. It was observable only with relu-like activation functions because other, like sigmoid or softsign, kept values ranges in network smaller, with an order of magnitude of 1 inststead of tens or hundreds.
The solution here was to just multiply input to rescale it to 0-1, in case of bytes by 1/255.
Upvotes: 0
Reputation: 2428
In a very improbable case, all activations in some layer can be negative for all samples. They are set to zero by the ReLU and there is no learning progress because the gradient is zero in the negative part of the ReLU.
Things that make this more probable are a small dataset, weird scaling of input features, inappropriate weight initialization, and/or few channels in intermediate layers.
Here you use random_normal_initializer
with mean=0.1
, so maybe your inputs are all negative, and thus get mapped to negative values. Try mean=0
, or rescale input features.
You can also try a Leaky ReLU. Also maybe the learning rate is too small or too large.
Upvotes: 1