Rookie
Rookie

Reputation: 161

simple linear regression failed to converge in tensorflow

I am new to machine learning and Tensorflow. Currently I am trying to follow the tutorial's logic to create a simple linear regression model of form y = a*x (there is no bias term here) . However, for some reason, the model fail to converge to the correct value "a". The data set is created by me in excel. As shown below:

enter image description here

here is my code that tries to run tensorflow on this dummy data set I generated.

import tensorflow as tf
import pandas as pd

w = tf.Variable([[5]],dtype=tf.float32)
b = tf.Variable([-5],dtype=tf.float32)
x = tf.placeholder(shape=(None,1),dtype=tf.float32)
y = tf.add(tf.matmul(x,w),b)

label = tf.placeholder(dtype=tf.float32)
loss = tf.reduce_mean(tf.squared_difference(y,label))

data = pd.read_csv("D:\\dat2.csv")
xs = data.iloc[:,:1].as_matrix()
ys = data.iloc[:,1].as_matrix()
optimizer = tf.train.GradientDescentOptimizer(0.000001).minimize(loss)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

for i in range(10000):
    sess.run(optimizer,{x:xs,label:ys})
    if i%100 == 0:  print(i,sess.run(w))
print(sess.run(w))

below is the print out in ipython console, as you can see after 10000th iteration, the value for w is around 4.53 instead of the correct value 6. I would really appreciate if anyone could shed some light on what is going on wrong here. I have played around with different learning rate from 0.01 to 0.0000001, none of the setting is able to have the w converge to 6. I have read some suggesting to normalize the feature to standard normal distribution, I would like to know if this normalization is a must? without normalization, gradientdescent is not able to find the solution? Thank you very much!

enter image description here

Upvotes: 1

Views: 885

Answers (1)

gdelab
gdelab

Reputation: 6220

It is a shaping problem: y and label don't have the same shape ([batch_size, 1] vs [batch_size]). In loss = tf.reduce_mean(tf.squared_difference(y, label)), it causes tensorflow to interpret things differently from what you want, probably by using some broadcasting... Anyway, the result is that your loss is not at all the one you want.

To correct that, simply replace

y = tf.add(tf.matmul(x, w), b)

by

y = tf.add(tf.matmul(x, w), b)
y = tf.reshape(y, shape=[-1])

My full working code below:

import tensorflow as tf
import pandas as pd

w = tf.Variable([[4]], dtype=tf.float64)
b = tf.Variable([10.0], dtype=tf.float64, trainable=True)
x = tf.placeholder(shape=(None, 1), dtype=tf.float64)
y = tf.add(tf.matmul(x, w), b)
y = tf.reshape(y, shape=[-1])
label = tf.placeholder(shape=(None), dtype=tf.float64)
loss = tf.reduce_mean(tf.squared_difference(y, label))

my_path = "/media/sf_ShareVM/data2.csv"
data = pd.read_csv(my_path, sep=";")
max_n_samples_to_use = 50
xs = data.iloc[:max_n_samples_to_use, :1].as_matrix()
ys = data.iloc[:max_n_samples_to_use, 1].as_matrix()
lr = 0.000001
optimizer = tf.train.GradientDescentOptimizer(learning_rate=lr).minimize(loss)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

for i in range(100000):
    _, loss_value, w_value, b_value, y_val, lab_val = sess.run([optimizer, loss, w, b, y, label], {x: xs, label: ys})
    if i % 100 == 0:  print(i, loss_value, w_value, b_value)
    if (i%2000 == 0 and 0< i < 10000):  # We use a smaller LR at first to avoid exploding gradient. It would be MUCH cleaner to use gradient clipping (by global norm)
        lr*=2
        optimizer = tf.train.GradientDescentOptimizer(learning_rate=lr).minimize(loss)

print(sess.run(w))

Upvotes: 1

Related Questions