Cannot optimize using tf.reduce_sum(), but succeeded using tf.reduce_mean()

Question

import tensorflow as tf
import numpy as np

#date generation
x_data = np.float32(np.random.rand(2, 100))
y_data = np.dot([0.1, 0.2], x_data) + 0.3

#linear model
b = tf.Variable(tf.zeros([1]))
W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0))
y = tf.matmul(W, x_data) + b

#minimize variance
loss = tf.reduce_sum(tf.square(y - y_data)) #why I cannot use sum here
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

#initialization
init = tf.global_variables_initializer()

#graph initialization
sess = tf.Session()
sess.run(init)

#train network
for step in range(201):
sess.run(train)
#if step % 20 == 0:
print(step, sess.run(W), sess.run(b), sess.run(loss))

Hi, I met a problem while realizing the toy model using tensorflow. When I used tf.reduce_sum() function as the loss function, the optimizer failed to converge. Actually the loss turned to be bigger and bigger. But when I changed the loss function from tf.reduce_sum() to tf.reduce_mean(), the optimizer worked successfully. Anyone can tell why tf.reduce_sum() doesn't work for this model but tf.reduce_mean() does?

coder3101 · Accepted Answer

The loss by summing across all the samples at once are more than the mean loss.

For example let's take that our desired y_data = [1.2, 3.2, 2.4] and predicted y = [1, 3, 3]

then by the following lines :

tf.reduce_sum(tf.square(y - y_data))

Loss will turn out to be :

0.04 + 0.04 + 0.36 = 0.44

Instead if you use reduce mean the same prediction will lead in lower loss, in this case

0.44/3 = 0.14666

So, Your gradient and parameter updates are also larger when you use reduce_sum Skipping a possible local minima.

Also Learning rates in optimizers are per example's loss, in case you want to achieve same effect for batch processing you need to divide the learning rate with batch size to train model successfully or use reduce_mean to train the model.

Cannot optimize using tf.reduce_sum(), but succeeded using tf.reduce_mean()

Answers (2)

Related Questions