Tom
Tom

Reputation: 35

Tensorflow model always produces mean

I am having trouble with fitting a very simple model in tensorflow. If I have a column of input data which is constant, my output always converges to produce the same value for all rows, which is the mean of my output data, y_, even when there is another column in x_ which has enough information to reproduce y_ exactly. Here is a small example.

import tensorflow as tf

def weight_variable(shape):
    """Initialize the weights with random weights"""
    initial = tf.truncated_normal(shape, stddev=0.1, dtype=tf.float64)
    return tf.Variable(initial)

#Initialize my data
x = tf.constant([[1.0,1.0],[1.0,2.0],[1.0,3.0]], dtype=tf.float64)
y_ = tf.constant([1.0,2.0,3.0], dtype=tf.float64)

w = weight_variable((2,1))
y = tf.matmul(x,w)

error = tf.reduce_mean(tf.square(y_ - y))

train_step = tf.train.AdamOptimizer(1e-5).minimize(error)

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    #Train the model and output every 1000 iterations
    for i in range(1000000):
        sess.run(train_step)
        err = sess.run(error)

        if i % 1000 == 0:
            print "\nerr:", err
            print "x: ", sess.run(x)
            print "w: ", sess.run(w)
            print "y_: ", sess.run(y_)
            print "y: ", sess.run(y)

This example always converges to w=[2,0], and y = [2,2,2]. This is a smooth function with a minimum at w=[0,1] and y = [1,2,3], where the error function is zero. Why does it not converge to this? I have also tried using gradient descent and I have tried varying the training rate.

Upvotes: 1

Views: 357

Answers (1)

keveman
keveman

Reputation: 8487

Your target is y_ = tf.constant([1.0,2.0,3.0], dtype=tf.float64) has the shape (1, 3). The output of tf.matmul(x, w) has the shape (3, 1). Thus y_ - y has the shape (3, 3) according to numpy broadcasting rules. So you are really not optimizing the function that you thought you were optimizing. Change your y_ to the following and give it a shot :

y_ = tf.constant([[1.0],[2.0],[3.0]], dtype=tf.float64)

This should converge pretty quickly to your expected answer, even with a large learning rate.

Upvotes: 3

Related Questions