Reputation: 1267
I am trying to measure MAPE (mean absolute percentage error ) value in my random forest code. The MAE value is 7.5. When I try to calculate MAPE, it outputs:
Accuracy: -inf %
Here is my code for calculating MAPE. How to make it work OR why is it not calculating a value.
mape = 100 * (errors / test_labels)
# Calculate and display accuracy
accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')
Here are the values:
errors: array([ 2.165, 6.398, 2.814, ..., 21.268, 8.746, 11.63 ])
test_labels: array([45, 47, 98, ..., 87, 47, 72])
These are types:
var1 int64
var2 int64
var3 float64
var4 int64
var6 float64
var7 int64
var1. float64
dtype: object
Examples values, over 8000 entries
var1 var2. var3 var4 var5 var6 var7
"420823370" "183" "2019-09-07 22:13:04" "84" "2019-09-07 22:12:46" "72" "00:00:18"
"420521201" "183" "2019-09-07 17:43:03" "84" "2019-09-07 17:42:51" "46" "00:00:12"
"420219554" "183" "2019-09-07 12:43:02" "88" "2019-09-07 12:42:39" "72" "00:00:23"
"419618820" "183" "2019-09-07 02:43:01" "92" "2019-09-07 02:42:46" "80" "00:00:15"
"419618819" "183" "2019-09-07 02:43:01" "84" "2019-09-07 02:42:46" "80" "00:00:15"
"417193989" "183" "2019-09-05 10:42:52" "82" "2019-09-05 10:42:23" "0" "00:00:29"
"416891691" "183" "2019-09-05 05:42:51" "78" "2019-09-05 05:42:49" "72" "00:00:02"
"416587222" "183" "2019-09-05 00:42:51" "88" "2019-09-05 00:42:35" "99" "00:00:16"
"416587223" "183" "2019-09-05 00:42:51" "82" "2019-09-05 00:42:35" "99" "00:00:16"
"416587224" "183" "2019-09-05 00:42:51" "80" "2019-09-05 00:42:35" "99" "00:00:16"
id:Big Int. ts_tuid: Big Int. rssi: numeric. batl: real. ts_diff:interval
Here is code example:
load data from CSV
model = (
pd.read_csv("source.csv", parse_dates=['var3', 'var5'], date_parser=lambda x: pd.to_datetime(x))
.assign(
rssi_ts=lambda x: x.loc[:, 'var3'].astype(int) / 10 ** 9,
batl_ts=lambda x: x.loc[:, 'var5'].astype(int) / 10 ** 9,
ts_diff=lambda x: pd.to_timedelta(x.loc[:, 'ts_diff']).astype(int) / 10 ** 9
)
)
# Labels are the values we want to predict
labels_b = np.array(halti['var4'])
# Remove the labels from the features
# axis 1 refers to the columns
features_r = halti.drop('var4', axis = 1)
features_r2 = list(features_r.columns)
# Convert to numpy array
features_r = np.array(features_r)
# Using Skicit-learn to split data into training and testing sets
from sklearn.model_selection import train_test_split
# Split the data into training and testing sets
train_features, test_features, train_labels, test_labels = train_test_split(features_r, labels_b, test_size = 0.25, random_state = 42)
# Import the model we are using
from sklearn.ensemble import RandomForestRegressor
# Instantiate model with 1000 decision trees
rf = RandomForestRegressor(n_estimators = 1000, random_state = 42)
# Train the model on training data
rf.fit(train_features, train_labels);
# Use the forest's predict method on the test data
predictions = rf.predict(test_features)
# Calculate the absolute errors
errors = abs(predictions - test_labels)
# Print out the mean absolute error (mae)
print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')
mape = 100 * (errors / test_labels)
# Calculate and display accuracy
accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')
Upvotes: 1
Views: 5663
Reputation: 518
This time output shows Inf in mape error measure. The reason behind it we have zeros in observed values. When the dependent variable can take zero as one of the outputs, we cannot use mape as error measure. In this case other error measures should be used.
refrence:https://rstudio-pubs-static.s3.amazonaws.com/390751_f6b763e827b24c9cb4406cd43129c8a9.html
Upvotes: 0
Reputation: 1306
You are getting this error because MAPE is undefined when your test label is 0, which is one of several shortcomings of using MAPE. If you replace accuracy = 100 - np.mean(mape)
with accuracy = 100 - np.mean(mape[np.isfinite(mape)])
you will get a more sensible number.
Upvotes: 3