Tensorflow neural network loss value NaN

I'm trying to build a simple multilayer perceptron model on a large data set but I'm getting the loss value as nan. The weird thing is: after the first training step, the loss value is not nan and is about 46 (which is oddly low. when i run a logistic regression model, the first loss value is about ~3600). But then, right after that the loss value is constantly nan. I used tf.print to try and debug it as well.

The goal of the model is to predict ~4500 different classes - so it's a classification problem. When using tf.print, I see that after the first training step (or feed forward through MLP), the predictions coming out from the last fully connected layer seem right (all varying numbers between 1 and 4500). But then, after that the outputs from the last fully connected layer go to either all 0's or some other constant number (0 0 0 0 0).

For some information about my model:

3 layer model. all fully connected layers.
batch size of 1000
learning rate of .001 (i also tried .1 and .01 but nothing changed)
using CrossEntropyLoss (i did add an epsilon value to prevent log0)
using AdamOptimizer
learning rate decay is .95

The exact code for the model is below: (I'm using the TF-Slim library)

input_layer = slim.fully_connected(model_input, 5000, activation_fn=tf.nn.relu)
hidden_layer = slim.fully_connected(input_layer, 5000, activation_fn=tf.nn.relu)
output = slim.fully_connected(hidden_layer, vocab_size, activation_fn=tf.nn.relu)
output = tf.Print(output, [tf.argmax(output, 1)], 'out = ', summarize = 20, first_n = 10)
return {"predictions": output}

Any help would be greatly appreciated! Thank you so much!

Upvotes: 2

Answers (3)

Jjoseph

Reputation: 214

From my understanding Relu doesn't put a cap on the upper bound for Neural Networks so its more likely to deconverge depending upon its implementation.

Try switching all the activation functions to tanh or sigmoid. Relu is generally used for convolution in cnns.

Its also difficult to determine if your deconverging due to cross entropy as we don't know how you effected it with your epsilon value. Try just using the residual its much simpler but still effective.

Also a 5000-5000-4500 neural network is huge. Its unlikely you actually need a network that large.

Upvotes: 0

OZ13

Reputation: 256

Two (possibly more) reasons why it doesn't work:

You skipped or inappropriately applied feature scaling of your inputs and outputs. Consequently, data may be difficult to handle for Tensorflow.
Using ReLu, which is a discontinuous function, may raise issues. Try using other activation functions, such as tanh or sigmoid.

Upvotes: 3

Arnaud De Broissia

Reputation: 659

For some reasons, your training process has diverged, and you may have infinite values in your weights, wich gives NaN losses. The reasons can be many, try changing your training parameters (use smaller batchs for test).

Also, using a relu for the last output in a classifier is not the usual method, try using a sigmoid.

Upvotes: 0

Tensorflow neural network loss value NaN

Answers (3)

Related Questions