Mohsen Ali
Mohsen Ali

Reputation: 701

Correct compute of equal error rate value

My work is to classifiy users' legitimate data from importers' data. I used binary SVM classifier to output probabilites of the two classes. Then used the Roc curve from sklearn to return TPR and FPR vectors over different thresholds as follows:

classifier = SVC(kernel='rbf',  probability=True,  random_state=0, gamma=0.0001, C=0.5)
classifier.fit(X_train, y_train)

y_probas = classifier.predict_proba(X_test)
fpr, tpr, thresholds = metrics.roc_curve(y_true, y_probas, pos_label=0)
fnr = 1- tpr 

The output fpr and tpr vectors are as follows:

enter image description here

However, to compute the EER value, I have googled a lot and found poeple talk about different suggestions. I know that the EER score should be the value when FNR and FPR are equal (or at least the distance between them is minimum as shown in red circle in the image below).

Is this means that the EER = the difference between fnr[%] = 8.33 and fpr[%] = 10.6? I mean EER = (10.6- 8.33) = 2.27%?

Or the minimum value of the two values? I mean EER = 8.33%

Or the average of the two values ? EER = (10.6 + 8.33)/2 = 9.465

I am confused about what is the correct way to compute EER from the best FNR and FPR values.

Upvotes: 0

Views: 2571

Answers (1)

Sarthak Vajpayee
Sarthak Vajpayee

Reputation: 136

Equal Error Rate (EER) means increasing the TPR and reducing the FPR as much as possible by choosing an optimum threshold on a ROC curve. This could also be seen as maximizing TPR and minimizing FPR or simply maximizing (TPR - FPR). By maximizing, I mean bringing the value close to 1. The above formulation can also be written as minimizing (1 - (TPR - FPR)); here, minimizing means bringing the value close to 0. And since, FNR = 1 - TPR, the above function takes the form, minimizing (FNR + FPR), i.e. both FNR and FPR must be close to 0.

ROC plot

Another approach says, the point on the top left corner of the ROC curve corresponds to the best threshold. This could be calculated by imagining a line from the top left to the bottom right corner of the ROC curve. The point closest to this diagonal corresponds to the best threshold. The equation of this diagonal would be:

TPR + FPR - 1 = 0 (or close to zero)
Now, FNR = 1 - TPR, so,
FPR - FNR = 0 (close to zero).

Both of the approaches above yield similar results and reduce the number of false positives and false negatives.

References: Picture link, another thread on EER (stackoverflow)

Upvotes: 1

Related Questions