Reputation: 41
Hej,
I am trying to write a small program to solve a Regression problem. My dataset is hereby 4 random x (x1,x2,x3 and x4) and 1 y value. One of the rows looks like this:
0.634585 0.552366 0.873447 0.196890 8.75
I know want to predict the y-value as close as possible so after the training I would like to evaluate how good my model is by showing the loss. Unfortunately I always receive
Training cost= nan
The most important lines of could would be:
X_data = tf.placeholder(shape=[None, 4], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)
# Input neurons : 4
# Hidden neurons : 2 x 8
# Output neurons : 3
hidden_layer_nodes = 8
w1 = tf.Variable(tf.random_normal(shape=[4,hidden_layer_nodes])) # Inputs -> Hidden Layer1
b1 = tf.Variable(tf.random_normal(shape=[hidden_layer_nodes])) # First Bias
w2 = tf.Variable(tf.random_normal(shape=[hidden_layer_nodes,1])) # Hidden layer2 -> Outputs
b2 = tf.Variable(tf.random_normal(shape=[1])) # Third Bias
hidden_output = tf.nn.relu(tf.add(tf.matmul(X_data, w1), b1))
final_output = tf.nn.relu(tf.add(tf.matmul(hidden_output, w2), b2))
loss = tf.reduce_mean(-tf.reduce_sum(y_target * tf.log(final_output), axis=0))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
steps = 10000
with tf.Session() as sess:
sess.run(init)
for i in range(steps):
sess.run(train,feed_dict={X_data:X_train,y_target:y_train})
# PRINT OUT A MESSAGE EVERY 100 STEPS
if i%500 == 0:
print('Currently on step {}'.format(i))
training_cost = sess.run(loss, feed_dict={X_data:X_test,y_target:y_test})
print("Training cost=", training_cost)
Maybe someone knows where my mistake is or even better, how to constantly show the error during my training :) I know how this is done with the tf.estimator, but not without. If you need the dataset, let me know.
Cheers!
Upvotes: 1
Views: 269
Reputation: 1829
This is because the Relu activation function causes the exploding gradient. Therefore, you need to reduce the learning rate accordingly. Moreover, you can try a different activation function also (for this you may have to normalize your dataset first)
Here, (In simple multi-layer FFNN only ReLU activation function doesn't converge) is a similar problem as your case. Follow the answer and you will understand.
Hope this helps.
Upvotes: 1