Reputation: 199
I'm currently learning TensorFlow and came across with this notebook.
I have a question with how the mean squared error cost function is implemented:
import tensorflow as tf
import numpy as np
predicted = np.array([1,2,3])
Y = np.array([4,5,6])
num_instances = predicted.shape[0]
cost = tf.reduce_sum(tf.pow(predicted-Y, 2))/(2*num_instances)
cost2 = tf.reduce_mean(tf.square(predicted - Y))
with tf.Session() as sess:
print(sess.run(cost))
print(sess.run(cost2))
I don't get it why does it have to multiply the denominator of 1st cost function to 2. I got different answers from the different implementations of MSE, cost yields 4.5 while cost2 yields 9. following the formula of the mean squared error, I should get a value of 9. but the 1st cost function is the one that is implemented in the python notebook that I'm trying to learn.
Upvotes: 3
Views: 4078
Reputation: 53768
The difference between cost
and cost2
is exactly 2
in 2*num_instances
. Basically,
cost = tf.reduce_sum(tf.pow(predicted-Y, 2))/(2*num_instances)
cost2 = tf.reduce_sum(tf.pow(predicted-Y, 2))/(num_instances)
The scalar 2
doesn't affect the learning that much, it's equivalent to multiplying the learning rate by 2
. Note that whatever formula and network topology you use, you still need to select reasonable hyperparameters, including the learning rate.
You can try to inspect the convergence of both loss functions, I suspect they perform the same. This means both formulas are fine, the second one is just easier to implement.
Upvotes: 2