XCeptable
XCeptable

Reputation: 1267

MAPE (mean absolute percentage error ) measurement in python result in error

I am trying to measure MAPE (mean absolute percentage error ) value in my random forest code. The MAE value is 7.5. When I try to calculate MAPE, it outputs:

Accuracy: -inf %

Here is my code for calculating MAPE. How to make it work OR why is it not calculating a value.

mape = 100 * (errors / test_labels)
# Calculate and display accuracy
accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')

Here are the values:

 errors: array([ 2.165,  6.398,  2.814, ..., 21.268,  8.746, 11.63 ])
 test_labels: array([45, 47, 98, ..., 87, 47, 72])

These are types:

var1          int64
var2          int64
var3          float64
var4          int64
var6          float64
var7          int64
var1.         float64
dtype: object

Examples values, over 8000 entries

      var1  var2.   var3               var4     var5                var6    var7
"420823370" "183"   "2019-09-07 22:13:04"   "84"    "2019-09-07 22:12:46"   "72"    "00:00:18"
"420521201" "183"   "2019-09-07 17:43:03"   "84"    "2019-09-07 17:42:51"   "46"    "00:00:12"
"420219554" "183"   "2019-09-07 12:43:02"   "88"    "2019-09-07 12:42:39"   "72"    "00:00:23"
"419618820" "183"   "2019-09-07 02:43:01"   "92"    "2019-09-07 02:42:46"   "80"    "00:00:15"
"419618819" "183"   "2019-09-07 02:43:01"   "84"    "2019-09-07 02:42:46"   "80"    "00:00:15"
"417193989" "183"   "2019-09-05 10:42:52"   "82"    "2019-09-05 10:42:23"   "0" "00:00:29"
"416891691" "183"   "2019-09-05 05:42:51"   "78"    "2019-09-05 05:42:49"   "72"    "00:00:02"
"416587222" "183"   "2019-09-05 00:42:51"   "88"    "2019-09-05 00:42:35"   "99"    "00:00:16"
"416587223" "183"   "2019-09-05 00:42:51"   "82"    "2019-09-05 00:42:35"   "99"    "00:00:16"
"416587224" "183"   "2019-09-05 00:42:51"   "80"    "2019-09-05 00:42:35"   "99"    "00:00:16"

id:Big Int. ts_tuid: Big Int. rssi: numeric. batl: real. ts_diff:interval 

Here is code example:

load data from CSV

model = (
    pd.read_csv("source.csv", parse_dates=['var3', 'var5'], date_parser=lambda x: pd.to_datetime(x))
    .assign(
        rssi_ts=lambda x: x.loc[:, 'var3'].astype(int) / 10 ** 9,
        batl_ts=lambda x: x.loc[:, 'var5'].astype(int) / 10 ** 9,
        ts_diff=lambda x: pd.to_timedelta(x.loc[:, 'ts_diff']).astype(int) / 10 ** 9
    )
)

# Labels are the values we want to predict
labels_b = np.array(halti['var4'])
# Remove the labels from the features
# axis 1 refers to the columns
features_r = halti.drop('var4', axis = 1)
features_r2 = list(features_r.columns) 
# Convert to numpy array
features_r = np.array(features_r)

# Using Skicit-learn to split data into training and testing sets
from sklearn.model_selection import train_test_split
# Split the data into training and testing sets
train_features, test_features, train_labels, test_labels = train_test_split(features_r, labels_b, test_size = 0.25, random_state = 42)

# Import the model we are using
from sklearn.ensemble import RandomForestRegressor
# Instantiate model with 1000 decision trees
rf = RandomForestRegressor(n_estimators = 1000, random_state = 42)
# Train the model on training data
rf.fit(train_features, train_labels);

# Use the forest's predict method on the test data
predictions = rf.predict(test_features)
# Calculate the absolute errors
errors = abs(predictions - test_labels)
# Print out the mean absolute error (mae)
print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')

mape = 100 * (errors / test_labels)
# Calculate and display accuracy
accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')

Upvotes: 1

Views: 5663

Answers (2)

Akash Desai
Akash Desai

Reputation: 518

This time output shows Inf in mape error measure. The reason behind it we have zeros in observed values. When the dependent variable can take zero as one of the outputs, we cannot use mape as error measure. In this case other error measures should be used.

refrence:https://rstudio-pubs-static.s3.amazonaws.com/390751_f6b763e827b24c9cb4406cd43129c8a9.html

Upvotes: 0

kevins_1
kevins_1

Reputation: 1306

You are getting this error because MAPE is undefined when your test label is 0, which is one of several shortcomings of using MAPE. If you replace accuracy = 100 - np.mean(mape) with accuracy = 100 - np.mean(mape[np.isfinite(mape)]) you will get a more sensible number.

Upvotes: 3

Related Questions