Why the in the ROC curve is not curve and how I can fix

I plot a ROC curve but I did't obtain a smooth line(curved line) why and how I can fix this enter image description here

What I'm trying to do is detecting anomaly from Satellite dataset from (http://odds.cs.stonybrook.edu/satellite-dataset/) at different contamination using isolation forest algorithm.my code is here

# define the contamination range 
contamination_range = np.linspace(0.01, 0.50, 50)

# Initialize empty lists to store contaminations and relevant MCC scores
mcc_scores = []
contamination = []
mcc_train = []
# For loop over different contamination levels and evaluate the model
start = datetime.datetime.now()
for cont in contamination_range:
    # Define and train Isolation Forest model
    model = IsolationForest(random_state=20, contamination=cont, n_estimators=70, max_samples=254)
    model.fit(X_train_sat, y_train_sat)

    val_preds = model.predict(X_train_sat)
    val_preds = np.where(val_preds == -1, 1, 0)
    val_MCC = matthews_corrcoef(y_train_sat, val_preds)
    mcc_train.append(val_MCC)
    # Make predictions on the validation set
    preds = model.predict(X_val_sat)
    # convert -1 to 1 and 1 to 0
    y_preds = np.where(preds == -1, 1, 0)
    # Evaluate the performance of the model on validation set
    MCC = matthews_corrcoef(y_val_sat, y_preds)

    # Append the results to the relevant lists
    mcc_scores.append(MCC)
    contamination.append(cont)
end = datetime.datetime.now()
print('--------------------------------------------------------------------------------')
print('Different contaminations:\n', np.round(contamination, 2))
print('--------------------------------------------------------------------------------')
print('Relevant MCC Scores:\n', np.round(mcc_scores, 2))
print('--------------------------------------------------------------------------------')
print('Relevant MCC Scores for train:\n', np.round(mcc_train, 2))
print('--------------------------------------------------------------------------------')
print('Execution time is: ', (end - start))
print('--------------------------------------------------------------------------------')
print(model.get_params)

and the code for plotting ROC curve is here

# plot riseiver operating characteristic
fpr, tpr, thresholds = roc_curve(y_test_sat, y_preds)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label='Satellite ROC (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC')
plt.legend()
plt.show()

Upvotes: 0

Views: 185

Answers (1)

Jon Nordby
Jon Nordby

Reputation: 6259

You are using the predict() function. This applies a binary threshold, so its output is either. To evaluate at different operating points and get a ROC curve (or PR curve, DET cure, et.c) one needs to use a continuous output score. For IsolationForest this means using score_samples() function. However note that in that case, a lower value means more abnormal.

Upvotes: 0

Related Questions