TensorFlow: implementing Mean Squared Error

Question

I'm currently learning TensorFlow and came across with this notebook.

I have a question with how the mean squared error cost function is implemented:

import tensorflow as tf 
import numpy as np 

predicted = np.array([1,2,3])
Y = np.array([4,5,6])
num_instances = predicted.shape[0]

cost = tf.reduce_sum(tf.pow(predicted-Y, 2))/(2*num_instances)
cost2 = tf.reduce_mean(tf.square(predicted - Y))

with tf.Session() as sess:
  print(sess.run(cost))
  print(sess.run(cost2))

I don't get it why does it have to multiply the denominator of 1st cost function to 2. I got different answers from the different implementations of MSE, cost yields 4.5 while cost2 yields 9. following the formula of the mean squared error, I should get a value of 9. but the 1st cost function is the one that is implemented in the python notebook that I'm trying to learn.

Maxim · Accepted Answer

The difference between cost and cost2 is exactly 2 in 2*num_instances. Basically,

cost = tf.reduce_sum(tf.pow(predicted-Y, 2))/(2*num_instances)
cost2 = tf.reduce_sum(tf.pow(predicted-Y, 2))/(num_instances)

The scalar 2 doesn't affect the learning that much, it's equivalent to multiplying the learning rate by 2. Note that whatever formula and network topology you use, you still need to select reasonable hyperparameters, including the learning rate.

You can try to inspect the convergence of both loss functions, I suspect they perform the same. This means both formulas are fine, the second one is just easier to implement.

TensorFlow: implementing Mean Squared Error

Answers (1)

Related Questions