piccolo
piccolo

Reputation: 2217

Simple Tensorflow architecture not training

I am training a simple network that learns the identity mapping. It is very simple: the input x is a single number and it is multiplied by the weight w to give the output y.

The weight w is initialized to 0.5 but it should move towards 1.0 the true value. However after training the network the weight is still at 0.5.

import tensorflow as tf
tf.reset_default_graph()
sess = tf.InteractiveSession()

x = tf.placeholder(tf.float32, shape=[None])

with tf.variable_scope('weight', reuse=True):
    w = tf.Variable([0.5])

weights = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='weight')

y = w*x
loss = tf.reduce_mean(y-x)
train_step = tf.train.AdamOptimizer(1e-3).minimize(loss, var_list=weights)
sess.run(tf.global_variables_initializer())
sess.run(train_step, feed_dict= {x:[2.0,3.5,4.6,7.8,6.5],y:[2.0,3.5,4.6,7.8,6.5]})

print(sess.run(weights))
#[array([ 0.49900001], dtype=float32)]

For such a simple network/problem, I expected w to converge to 1.0 pretty quickly too.

EDIT:

When I trained this for multiple epochs

for _ in range(10000):
    sess.run(train_step, feed_dict= {x:[2.0,3.5,4.6,7.8,6.5],y:[2.0,3.5,4.6,7.8,6.5]})

the weights diverge to:

[array([-99.50284576], dtype=float32)]

EDIT 2:

I have also found that my losses are being computed as zero. I am not sure what is going on ???

data = [np.random.randn() for _ in range(100)]

for _ in range(100):
    _, loss_val = sess.run([train_step,loss] , feed_dict= {x:data,y:data})
    print ('loss = ' , loss_val)

Output:

loss =  0.0
loss =  0.0
loss =  0.0
loss =  0.0
loss =  0.0
loss =  0.0
...

Upvotes: 1

Views: 102

Answers (1)

BugKiller
BugKiller

Reputation: 1488

1> Cost function: MSE

2> add one more placeholder for true target

import tensorflow as tf
tf.reset_default_graph()
sess = tf.InteractiveSession()

x = tf.placeholder(tf.float32, shape=[None])
# placeholder for true target
y = tf.placeholder(tf.float32, shape=[None])

with tf.variable_scope('weight', reuse=True):
    w = tf.Variable([0.5])

weights = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='weight')

y_pred = w*x
# we choose mse as cost function
loss = tf.reduce_mean((y_pred-y)**2)
train_step = tf.train.AdamOptimizer(1e-3).minimize(loss, var_list=weights)
sess.run(tf.global_variables_initializer())
for _ in range(10000):
    sess.run(train_step, feed_dict= {x:[2.0,3.5,4.6,7.8,6.5],
                                     y:[2.0,3.5,4.6,7.8,6.5]})

print(w.eval())

output: [1.]

In your code, prediction w*x actually DONT take effect since you always feed constant array to y

Upvotes: 1

Related Questions