Reputation: 13
import tensorflow as tf
import numpy as np
#date generation
x_data = np.float32(np.random.rand(2, 100))
y_data = np.dot([0.1, 0.2], x_data) + 0.3
#linear model
b = tf.Variable(tf.zeros([1]))
W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0))
y = tf.matmul(W, x_data) + b
#minimize variance
loss = tf.reduce_sum(tf.square(y - y_data)) #why I cannot use sum here
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
#initialization
init = tf.global_variables_initializer()
#graph initialization
sess = tf.Session()
sess.run(init)
#train network
for step in range(201):
sess.run(train)
#if step % 20 == 0:
print(step, sess.run(W), sess.run(b), sess.run(loss))
Hi, I met a problem while realizing the toy model using tensorflow. When I used tf.reduce_sum() function as the loss function, the optimizer failed to converge. Actually the loss turned to be bigger and bigger. But when I changed the loss function from tf.reduce_sum() to tf.reduce_mean(), the optimizer worked successfully. Anyone can tell why tf.reduce_sum() doesn't work for this model but tf.reduce_mean() does?
Upvotes: 1
Views: 2007
Reputation: 4165
The loss by summing across all the samples at once are more than the mean loss.
For example let's take that our desired y_data = [1.2, 3.2, 2.4] and predicted y = [1, 3, 3]
then by the following lines :
tf.reduce_sum(tf.square(y - y_data))
Loss will turn out to be :
0.04 + 0.04 + 0.36 = 0.44
Instead if you use reduce mean the same prediction will lead in lower loss, in this case
0.44/3 = 0.14666
So, Your gradient and parameter updates are also larger when you use reduce_sum Skipping a possible local minima.
Also Learning rates in optimizers are per example's loss, in case you want to achieve same effect for batch processing you need to divide the learning rate with batch size to train model successfully or use reduce_mean to train the model.
Upvotes: 6
Reputation: 1476
I came across similar issue. have a look at user: Stephen's response which'll answer your question: TensfoFlow: Linear Regression loss increasing (instead decreasing) with successive epochs
Upvotes: 1