why im getting high MAE(mean absolute error) and MSE(mean square erro) compared to MAPE (mean absolute persentage error)?

Question

everyone I'm a newbie in data science. I'm working on a regression problem using support vector regression. After tunning SVM parameters using grid search I got 2.6% MAPE but my MAE and MSE are still very high.

I have used a user-defined function for mape.

from sklearn.metrics import mean_absolute_error 
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import Normalizer
import matplotlib.pyplot as plt
def mean_absolute_percentage_error(y_true, y_pred): 
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

import pandas as pd
from sklearn import preprocessing

features=pd.read_csv('selectedData.csv')
import numpy as np
from scipy import stats
print(features.shape)
features=features[(np.abs(stats.zscore(features)) < 3).all(axis=1)]
target = features['SYSLoad']
features= features.drop('SYSLoad', axis = 1)
names=list(features)

for i in names:
    x=features[[i]].values.astype(float)
    min_max_scaler = preprocessing.MinMaxScaler()
    x_scaled = min_max_scaler.fit_transform(x)
    features[i]=x_scaled

Selecting the target Variable which want to predict and for which we are

finding feature imps

import numpy as np
from sklearn.model_selection import train_test_split
train_input, test_input, train_target, test_target = 
train_test_split(features, target, test_size = 0.25, random_state = 42)
trans=Normalizer().fit(train_input);
train_input=Normalizer().fit_transform(train_input);
test_input=trans.fit_transform(test_input);

n=test_target.values;
test_targ=pd.DataFrame(n);

from sklearn.svm import SVR
svr_rbf = SVR(kernel='poly', C=10, epsilon=10,gamma=10)
y_rbf = svr_rbf.fit(train_input, train_target);
predicted=y_rbf.predict(test_input);
plt.figure
plt.xlim(20,100);
print('Total Days For training',len(train_input)); print('Total Days For 
Testing',len(test_input))
plt.ylabel('Load(MW) Prediction 3 '); plt.xlabel('Days'); 
plt.plot(test_targ,'-b',label='Actual'); plt.plot(predicted,'-r',label='RBF 
kernel ');
plt.gca().legend(('Actual','RBF'))
plt.title('SVM')
plt.show();



MAPE=mean_absolute_percentage_error(test_target,predicted);
print(MAPE);
mae=mean_absolute_error(test_targ,predicted)
mse=mean_squared_error(test_targ, predicted)
print(mae);
print(mse);

I'm getting MAPE = 2.56 , MAE =400 , MSE=437696. arent mae and mse are huge. and why they are? My target variable which is sysload contains values in range of 10 thousands

Richard Rublev · Accepted Answer

Since you have not provided data, we can not reproduce your example. Bu take a look at this

y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

Your code

def mean_absolute_percentage_error(y_true, y_pred): 
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

Output

32.73809523809524

Let's compare

mean_squared_error(y_true, y_pred)
0.375

It is very close. Something is probably wrong with feature selection.

why im getting high MAE(mean absolute error) and MSE(mean square erro) compared to MAPE (mean absolute persentage error)?

Selecting the target Variable which want to predict and for which we are

Answers (1)

Related Questions