Vibhor Jain
Vibhor Jain

Reputation: 1476

TensfoFlow: Linear Regression loss increasing (instead decreasing) with successive epochs

I'm learning TensorFlow and trying to apply it on a simple linear regression problem. data is numpy.ndarray of shape [42x2].

I'm a bit puzzled why after each succesive epoch the loss is increasing. Isn't the loss expected to to go down with each successive epoch!

Here is my code (let me know, if you'd like me to share the output as well!): (Thanks a lot for taking your time to answer to it.)

1) created the placeholders for dependent / independent variables

X = tf.placeholder(tf.float32, name='X')
Y = tf.placeholder(tf.float32,name='Y')

2) created vars for weight, bias, total_loss (after each epoch)

w = tf.Variable(0.0,name='weights')
b = tf.Variable(0.0,name='bias')

3) defined loss function & optimizer

Y_pred = X * w + b
loss = tf.reduce_sum(tf.square(Y - Y_pred), name = 'loss')
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.001).minimize(loss)

4) created summary events & event file writer

tf.summary.scalar(name = 'weight', tensor = w)
tf.summary.scalar(name = 'bias', tensor = b)
tf.summary.scalar(name = 'loss', tensor = loss)
merged = tf.summary.merge_all()

evt_file = tf.summary.FileWriter('def_g')
evt_file.add_graph(tf.get_default_graph())

5) and execute all in a session

with tf.Session() as sess1:
sess1.run(tf.variables_initializer(tf.global_variables()))
for epoch in range(10):
    summary, _,l = sess1.run([merged,optimizer,loss],feed_dict={X:data[:,0],Y:data[:,1]})
    evt_file.add_summary(summary,epoch+1)
    evt_file.flush()
    print("  new_loss: {}".format(sess1.run(loss,feed_dict={X:data[:,0],Y:data[:,1]})))            

Cheers!

Upvotes: 2

Views: 1725

Answers (1)

Stephen
Stephen

Reputation: 824

The short answer is that your learning rate is too big. I was able to get reasonable results by changing it from 0.001 to 0.0001, but I only used the 23 points from your second-last comment (I initially didn't notice your last comment), so using all the data might require an even lower number.

0.001 seems like a really low learning rate. However, the real problem is that your loss function is using reduce_sum instead of reduce_mean. This causes your loss to be a large number, which sends a very strong signal to the GradientDescentOptimizer, so it's overshooting despite the low learning rate. The problem would only get worse if you added more points to your training data. So use reduce_mean to get the average squared error and your algorithms will be much better behaved.

Upvotes: 10

Related Questions