Reputation: 21632
Why mean used instead of sum in loss functions?
i.e. is there any reason why this is prefered
def mae_loss(y_true, y_pred):
loss = tf.reduce_mean(tf.abs(y_true-y_pred))
return loss
to this
def mae_loss(y_true, y_pred):
loss = tf.reduce_sum(tf.abs(y_true-y_pred))
return loss
In Keras source code mean variant is also used:
Upvotes: 3
Views: 7508
Reputation: 2437
I suppose it is mostly for the sake of understandability. When you take the average over the loss for data points, you're going to have a better sense of how your model is actually working.
For example, suppose that you have a task for predicting grades of 30 students in two college classes (like our batches). So each of the classes (A, and B) has 30 students with their grades as true labels.
if you want to model this task in terms of neural networks, you are going to have a tensor of size [2, 30]
, where each element is a number, let's say between 0 (as minimum) and 20 (as maximum) as your ground-truth. Your network will also output a tensor with the same shape(i.e., [2,30]
) as the grade predictions.
When calculating mean squared error and having the reduction mean
, you are going to get a number that is definitely between 0 and 20, telling you how much each student is far from his/her real score (that should be predicted) on average. This intuition is much easier to grasp that when summing all of these losses over all the students within a class, and even over all the students within the college ––just assuming that the college has only two classes.
But if you just wonder how would they affect the learning process of neural network, I would say that shouldn't make much difference. Since even when using sum
reduction, your network will optimize to minimize the loss function, such that get a lower number at the next steps (either sum
or mean
),
Upvotes: 1
Reputation: 512
We usually calculate loss to compare it with others or to decrease it as much as we can. If you just get sum instead of mean, the result will be varied depending on the number of data, then it'll be hard to find if it's big or not instinctively. That's why we usually use 'mean squared error' or 'mean absolute error' instead of their sum.
Upvotes: 11