Reputation: 1
I plot a ROC curve but I did't obtain a smooth line(curved line) why and how I can fix this enter image description here
What I'm trying to do is detecting anomaly from Satellite dataset from (http://odds.cs.stonybrook.edu/satellite-dataset/) at different contamination using isolation forest algorithm.my code is here
# define the contamination range
contamination_range = np.linspace(0.01, 0.50, 50)
# Initialize empty lists to store contaminations and relevant MCC scores
mcc_scores = []
contamination = []
mcc_train = []
# For loop over different contamination levels and evaluate the model
start = datetime.datetime.now()
for cont in contamination_range:
# Define and train Isolation Forest model
model = IsolationForest(random_state=20, contamination=cont, n_estimators=70, max_samples=254)
model.fit(X_train_sat, y_train_sat)
val_preds = model.predict(X_train_sat)
val_preds = np.where(val_preds == -1, 1, 0)
val_MCC = matthews_corrcoef(y_train_sat, val_preds)
mcc_train.append(val_MCC)
# Make predictions on the validation set
preds = model.predict(X_val_sat)
# convert -1 to 1 and 1 to 0
y_preds = np.where(preds == -1, 1, 0)
# Evaluate the performance of the model on validation set
MCC = matthews_corrcoef(y_val_sat, y_preds)
# Append the results to the relevant lists
mcc_scores.append(MCC)
contamination.append(cont)
end = datetime.datetime.now()
print('--------------------------------------------------------------------------------')
print('Different contaminations:\n', np.round(contamination, 2))
print('--------------------------------------------------------------------------------')
print('Relevant MCC Scores:\n', np.round(mcc_scores, 2))
print('--------------------------------------------------------------------------------')
print('Relevant MCC Scores for train:\n', np.round(mcc_train, 2))
print('--------------------------------------------------------------------------------')
print('Execution time is: ', (end - start))
print('--------------------------------------------------------------------------------')
print(model.get_params)
and the code for plotting ROC curve is here
# plot riseiver operating characteristic
fpr, tpr, thresholds = roc_curve(y_test_sat, y_preds)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label='Satellite ROC (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC')
plt.legend()
plt.show()
Upvotes: 0
Views: 185
Reputation: 6259
You are using the predict()
function. This applies a binary threshold, so its output is either. To evaluate at different operating points and get a ROC curve (or PR curve, DET cure, et.c) one needs to use a continuous output score. For IsolationForest
this means using score_samples()
function. However note that in that case, a lower value means more abnormal.
Upvotes: 0