NaN values for loss function (MSE) in TensorFlow

Question

I would like to use a Feedforward Neural Network to output a continuous real value, using TensorFlow. My inputs values are, of course, continuous real values too.

I want my net to have two hidden layers and to use MSE as the cost function, so I've defined it like this:

def mse(logits, outputs):
    mse = tf.reduce_mean(tf.pow(tf.sub(logits, outputs), 2.0))
    return mse

def training(loss, learning_rate):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    train_op = optimizer.minimize(loss)
    return train_op

def inference_two_hidden_layers(images, hidden1_units, hidden2_units):
    with tf.name_scope('hidden1'):
        weights = tf.Variable(tf.truncated_normal([WINDOW_SIZE, hidden1_units],stddev=1.0 / math.sqrt(float(WINDOW_SIZE))),name='weights')
        biases = tf.Variable(tf.zeros([hidden1_units]),name='biases')
        hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)

    with tf.name_scope('hidden2'):
        weights = tf.Variable(tf.truncated_normal([hidden1_units, hidden2_units],stddev=1.0 / math.sqrt(float(hidden1_units))),name='weights')
        biases = tf.Variable(tf.zeros([hidden2_units]),name='biases')
        hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

    with tf.name_scope('identity'):
        weights = tf.Variable(tf.truncated_normal([hidden2_units, 1],stddev=1.0 / math.sqrt(float(hidden2_units))),name='weights')
        biases = tf.Variable(tf.zeros([1]),name='biases')

        logits = tf.matmul(hidden2, weights) + biases

   return logits

I'm doing a batch training and every step I evaluate the train_op and loss operators.

_, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)

The problem is that I'm getting some NaN values as the result of evaluating the loss function. That does NOT happen if I just use a neural network with just one hidden layer like the following:

def inference_one_hidden_layer(inputs, hidden1_units):
    with tf.name_scope('hidden1'):
        weights = tf.Variable(
    tf.truncated_normal([WINDOW_SIZE, hidden1_units],stddev=1.0 / math.sqrt(float(WINDOW_SIZE))),name='weights')
        biases = tf.Variable(tf.zeros([hidden1_units]),name='biases')
        hidden1 = tf.nn.relu(tf.matmul(inputs, weights) + biases)

    with tf.name_scope('identity'):
        weights = tf.Variable(
    tf.truncated_normal([hidden1_units, NUM_CLASSES],stddev=1.0 / math.sqrt(float(hidden1_units))),name='weights')
        biases = tf.Variable(tf.zeros([NUM_CLASSES]),name='biases')
        logits = tf.matmul(hidden1, weights) + biases

    return logits

Why do I get NaN loss values when using a two hidden layers net?

Rob Romijnders · Accepted Answer

Mind your learning rate. If you expand your network, you'll have more parameters to learn. That means you also need to decrease the learning rate.

For a high learning rate, your weights will explode. Also your output values will explode then.

NaN values for loss function (MSE) in TensorFlow

Answers (1)

Related Questions