MSE Loss for matrix Machine Learning

Question

I have a model with N inputs and 6 outputs after each epoch.

My output looks like, [x y z xx yy zz] and I want to minimize the MSE of each term. However, I've noticed that when I use MSE as a loss function, it is just taking the mean of the sum of the squares of the entire set.

Autonomous · Accepted Answer

I think they both mean the same thing. Let us denote your predictions for i^th sample by [x_i, y_i, z_i, xx_i, yy_i, zz_i]. The true values are denoted by [t_x_i, t_y_i, t_z_i, t_xx_i, t_yy_i, t_zz_i]

Over a batch of N samples, you want to minimize:

L = \sum_i=1^N ((x_i-t_x_i)^2)/N + ... + \sum_i=1^N ((zz_i-t_zz_i)^2)/N

The MSE loss will minimize the following:

L = (1/N) * \sum_i=1^N ((1/6) * [(x_i - t_x_i)^2 + ... + (zz_i-t_zz_i)^2])

You can see that both finally minimize the same quantity.

I think this will stand true in case your six outputs are independent variables, which I think they are, since you model them as six distinct outputs with six ground truth labels.

MSE Loss for matrix Machine Learning

Answers (2)

Related Questions