TensorFlow Linear Regression - returning NaN for weights,bias and Inf for loss

Question

I'm trying to do the below linear regression in TensorFlow, but my output is all Inf and NaNs.

My input dataset has to be Y=0.5*X + 2 + Noise; where X is a normal distribution of size(1000) and Noise is Gaussian with (mu=0.0 and sigma=50)

Output:

loss= 82662.945 W= 15974.369 b 24.379812

loss= 81293050000000.0 W= -508895600.0 b -775064.06

loss= 8.250697e+22 W= 16212403000000.0 b 24692003000.0

loss= 8.373905e+31 W= -5.1649487e+17 b -786638100000000.0

loss= inf W= 1.6454498e+22 b 2.5060722e+19

loss= inf W= -5.2420755e+26 b -7.9838474e+23

loss= inf W= 1.6700204e+31 b 2.543495e+28

loss= inf W= -5.320352e+35 b -8.1030665e+32

loss= inf W= inf b inf

loss= inf W= nan b nan

loss= nan W= nan b nan

import tensorflow as tf
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt


noise=np.random.normal(0.0,50,1000)#.astype(np.float32)
x_data=np.random.uniform(0,1000,1000)#.astype(np.float32)
y_data=0.5*x_data+2+noise#.astype(np.float32)

plt.scatter(x_data,y_data,s=0.1)
plt.show()


X=tf.placeholder(shape=(1000,),dtype=tf.float32)
Y=tf.placeholder(shape=(1000,),dtype=tf.float32)

#Learning W and b over the epochs
W=tf.get_variable(name='Weight',dtype=tf.float32,shape(),initializer=tf.zeros_initializer())
b=tf.get_variable(name='Bias',dtype=tf.float32,shape=(),initializer=tf.zeros_initializer())

Y_pred= tf.add(tf.multiply(X, W),b)
loss = tf.reduce_mean(tf.square(Y_pred - Y))



optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.05).minimize(loss)

epochs=100
with tf.Session() as sess:
init=tf.global_variables_initializer()
sess.run(init)
for e in range(epochs):
    _,c=sess.run([optimizer,loss],feed_dict={X: x_data,Y: y_data})
    print('loss=',c,'W=',sess.run(W),'b',sess.run(b))

#plt.scatter(x_data, y_data, 'ro', label='Original data')
plt.plot(x_data, sess.run(W) * x_data + sess.run(b), label='Fitted line')
plt.legend()
plt.show()

Stewart_R · Accepted Answer

You have neatly recreated a straightforward example of the exploding gradient problem.

You can read up on potential solutions but the simplest for a toy example might be to reduce your learning rate.

Intuitively, gradient descent is like trying to find your way to the valley floor by pointing in the downhill direction and taking a step, then repeat. At each stage you re-evaluate the direction based on what is downhill now. If the valley is smooth with no local low-spots and your step size is small enough you will eventually find the bottom.

The learning rate is analogous with the size of the step.

So, with too high a learning rate, you can now imagine you are taking such a large step that you step right across the whole valley to a point higher up the hill on the opposite side. Then you turn in order to point downhill again (so roughly a 180 turn) and face the centre of the valley but step right across to even higher up the other side. And so on getting higher and higher up the opposite sides of the valley

So, dramatically reducing your learning rate to something like this seems to allow it to converge:

optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.000001).minimize(loss)

TensorFlow Linear Regression - returning NaN for weights,bias and Inf for loss

Answers (1)

Related Questions