Arturo Sbr
Arturo Sbr

Reputation: 6323

Get the area under a ROC curve in python pyod?

I have data for 5,000 observations. I split the dataset in two: the variables (X_train) and the labeled target (y_train). I am using pyod because it seems to be the most popular Python library for anomaly detection.

I fit the model to the data with the following code:

from pyod.models.knn import KNN
from pyod.utils import evaluate_print

clf = KNN(n_neighbors=10, method='mean', metric='euclidean')
clf.fit(X_train)
scores = clf.decision_scores_

The model is now fitted and I have the probability of an observation being an outlier stored in scores. I manually calculated the area under the ROC curve and it returned 0.69.

I noticed this is the same result when using:

evaluate_print('KNN with k=10', y=y_train, y_pred=scores)

Which returns: KNN with k=10 ROC:0.69, precision @ rank n:0.1618.

I want to know if there is a specific function in pyod which would return only the 0.69.

Upvotes: 0

Views: 912

Answers (2)

Arvind Kumar
Arvind Kumar

Reputation: 66

The pyod package itself computes ROC from sklearn.metrics.roc_auc_score. You can see that in Benchmark.ipynb in notebooks folder of the pyod repository. So to get only the ROC please use this: from sklearn.metrics import roc_auc_score

roc = round(roc_auc_score(y_test, test_scores))

Upvotes: 0

CutePoison
CutePoison

Reputation: 5355

I do not know pyod but sklearn has the roc_auc_score or auc which does that job. It is very easy to use and I imagine it is a line or two to work with your project.

from sklearn import metrics

fpr, tpr, thresholds = metrics.roc_curve(y_true=y_train, y_score=scores)
auc.append(metrics.auc(fpr, tpr))

Upvotes: 2

Related Questions