Ruta Desai
Ruta Desai

Reputation: 81

numpy weighted average for calculating weighted mean squared error

I am trying to compute weighted mean squared error for my regression problem. I have y_true, y_predicted, and y_wts numpy arrays. Each array is shaped (N,1) where N is the number of samples. I don't understand why the following 2 pieces of code give different answers:

1st code segment
import numpy as np
sq_error = (y_true-y_predicted)**2
wtd_sq_error = np.multiply(sq_error,y_wts)
wtd_mse = np.mean(wtd_sq_error)
2nd code segment taken from sklearn metrics mean_squared_error function
wtd_mse_sklearn = np.average((y_true - y_predicted)**2, axis =0,
                               weights=y_wts)

I came to test this owing to mis-match between tensorflow weighted mean squared error and sklearn metrics mean squared error (with weight column specified). Note that this mismatch doesnt occur when I don't specify a weight column.

Thanks for your help!

Upvotes: 1

Views: 3978

Answers (2)

norok2
norok2

Reputation: 26886

You are having the formula for the weighted average in your 1st code segment wrong, it should be:

wtd_mse = np.sum(sq_error * y_wts) / np.sum(y_wts)

instead of:

wtd_mse = np.mean(wtd_sq_error)

Upvotes: 0

Fabricator
Fabricator

Reputation: 12772

Because you forgot about weight:

np.mean = sum(error_i * weight_i ∀ i) / len(error_i ∀ i)

while

np.average = sum(error_i * weight_i ∀ i) / sum(weight_i ∀ i)

Upvotes: 1

Related Questions