Reputation: 81
I am trying to compute weighted mean squared error for my regression problem. I have y_true, y_predicted, and y_wts numpy arrays. Each array is shaped (N,1) where N is the number of samples. I don't understand why the following 2 pieces of code give different answers:
1st code segmentimport numpy as np
sq_error = (y_true-y_predicted)**2
wtd_sq_error = np.multiply(sq_error,y_wts)
wtd_mse = np.mean(wtd_sq_error)
2nd code segment taken from sklearn metrics mean_squared_error function
wtd_mse_sklearn = np.average((y_true - y_predicted)**2, axis =0,
weights=y_wts)
I came to test this owing to mis-match between tensorflow weighted mean squared error and sklearn metrics mean squared error (with weight column specified). Note that this mismatch doesnt occur when I don't specify a weight column.
Thanks for your help!
Upvotes: 1
Views: 3978
Reputation: 26886
You are having the formula for the weighted average in your 1st code segment wrong, it should be:
wtd_mse = np.sum(sq_error * y_wts) / np.sum(y_wts)
instead of:
wtd_mse = np.mean(wtd_sq_error)
Upvotes: 0
Reputation: 12772
Because you forgot about weight:
np.mean = sum(error_i * weight_i ∀ i) / len(error_i ∀ i)
while
np.average = sum(error_i * weight_i ∀ i) / sum(weight_i ∀ i)
Upvotes: 1