Patricia Centeno
Patricia Centeno

Reputation: 15

Why is my Mean Absolute Error (MAE) from a MultiOutputRegressor method showing one value instead of three?

I have the following code, in which I need to predict 3 different outputs and then calculate the MAE (mean absolute error) for each output. Since the Support Vector Machine Regression does not support multioutput regression by itself like other models do, like Random Forest and Linear regression. I found an option to do this with a MultiOutputRegressor class and considering this as a separate model for each output.

I have the following code where x are my features for both training and testing and y are my targets.

1) First I wanted to show that effectively my targets (y) have 3 values

print(X.shape, X_test.shape,y.shape,y_test.shape)

(10845, 2116) (4648, 2116) (10845, 3) (4648, 3)

2) Then I have the following code to calculate the mean absolute error (MAE) as well as to train a model and evaluate it on the dataset:

# Function to calculate mean absolute error
def mae(y_true, y_pred):
    return np.mean(abs(y_true - y_pred))

# Funtion to take in a model, train it and evaluate it on the test set
def fit_and_evaluate2 (model):

    # Train the model with training dataset for features (X) and target (y) 
    model.fit(X, y)

    # Make predictions for the test dataset and evaluate the predictions vs the target in the test dataset
    model_pred = model.predict(X_test)
    model_mae = mae(y_test, model_pred)

    # Return the performance metric
    return model_mae

3) When I call this function for my Support Vector Machine Regression, the output given by model_pred is in fact 3 values, but the MAE model_mae is only 1 value:

svm = SVR(C = 1000, gamma = 0.1)
wrapper= MultiOutputRegressor(svm)

svm_mae = fit_and_evaluate2(wrapper)

print('Support Vector Machine Regression Performance on the test set is')
svm_mae

Support Vector Machine Regression Performance on the test set is
0.19932177495538966

I don´t understand why model_mae shows only one value, since as shown above my target y effectively has 3 values and the model_pred also shows 3 values. Is there something I am doing wrong? I tried this with Random Forest and both predictions and MAE show 3 values.

Upvotes: 1

Views: 3571

Answers (1)

desertnaut
desertnaut

Reputation: 60400

The reason is the default axis=None which is used in np.mean when no axis argument is specified; from the docs:

axis: None or int or tuple of ints, optional

Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.

since it first flattens the array (i.e. no more 3 different outputs), and then it computes the MAE, which is now a single number.

You should change the definition of your mae function to:

def mae(y_true, y_pred):
    return np.mean(abs(y_true - y_pred), axis=0)

Let's confirm that it will work with some dummy data:

import numpy as np

# 2-output data
y_true = np.array([[0.5, 1], [-1, 1], [7, -6]])
y_pred = np.array([[0, 2], [-1, 2], [8, -5]])
mae(y_true, y_pred)
# array([0.5, 1. ])

i.e. a 2-valued MAE output, as required.

We can actually confirm this result using scikit-learn's mean_absolute_error with the appropriate argument multioutput='raw_values' (docs):

from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred, multioutput='raw_values')
# array([0.5, 1. ])

Arguably, and since you are already using scikit-learn, you would be better utilizing the existing function for MAE instead of using your own.

Upvotes: 1

Related Questions