Reputation: 35
I am having trouble with fitting a very simple model in tensorflow. If I have a column of input data which is constant, my output always converges to produce the same value for all rows, which is the mean of my output data, y_, even when there is another column in x_ which has enough information to reproduce y_ exactly. Here is a small example.
import tensorflow as tf
def weight_variable(shape):
"""Initialize the weights with random weights"""
initial = tf.truncated_normal(shape, stddev=0.1, dtype=tf.float64)
return tf.Variable(initial)
#Initialize my data
x = tf.constant([[1.0,1.0],[1.0,2.0],[1.0,3.0]], dtype=tf.float64)
y_ = tf.constant([1.0,2.0,3.0], dtype=tf.float64)
w = weight_variable((2,1))
y = tf.matmul(x,w)
error = tf.reduce_mean(tf.square(y_ - y))
train_step = tf.train.AdamOptimizer(1e-5).minimize(error)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
#Train the model and output every 1000 iterations
for i in range(1000000):
sess.run(train_step)
err = sess.run(error)
if i % 1000 == 0:
print "\nerr:", err
print "x: ", sess.run(x)
print "w: ", sess.run(w)
print "y_: ", sess.run(y_)
print "y: ", sess.run(y)
This example always converges to w=[2,0], and y = [2,2,2]. This is a smooth function with a minimum at w=[0,1] and y = [1,2,3], where the error function is zero. Why does it not converge to this? I have also tried using gradient descent and I have tried varying the training rate.
Upvotes: 1
Views: 357
Reputation: 8487
Your target is y_ = tf.constant([1.0,2.0,3.0], dtype=tf.float64)
has the shape (1, 3)
. The output of tf.matmul(x, w)
has the shape (3, 1)
. Thus y_ - y
has the shape (3, 3)
according to numpy broadcasting rules. So you are really not optimizing the function that you thought you were optimizing. Change your y_
to the following and give it a shot :
y_ = tf.constant([[1.0],[2.0],[3.0]], dtype=tf.float64)
This should converge pretty quickly to your expected answer, even with a large learning rate.
Upvotes: 3