Reputation: 1
I know about TPR values, FPR values, roc curve and the associated auc score. The roc curve plots fpr and tpr. I use python sklearn library for all these. But I came across this plot which I could not understand. For different thresholds we have different fpr and tpr and we plot them to get roc curve and the associated auc score. But in this plot, I see curves generated for different desired auc score like 0.01, 0.001 and 0.0001. How this is done? Did I correctly describe the plot?
Write me some suggestions or codes using sklearn to do so. Detection performance is described as " The area under the ROC curve (AUC) as a single continuous measure for the detection performance that yields a minimal and maximal value of 0.0 and 1.0, respectively."
import numpy as np
# Sample TPR and FPR values for different thresholds (replace this with your data)
tpr_values = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
fpr_values = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
# Ensure that TPR and FPR values are in the correct order (increasing FPR)
tpr_values = np.flip(tpr_values)
fpr_values = np.flip(fpr_values)
# Associated AUC score (replace this with your calculated AUC)
original_auc = 0.75
# Your desired AUC score
desired_auc = 0.001 # Replace this with your desired AUC value
# Function to interpolate TPR and FPR values for the desired AUC at each threshold
def interpolate_tpr_fpr_for_auc(desired_auc, tpr_values, fpr_values, original_auc):
interpolated_tpr = []
interpolated_fpr = []
# Interpolate at each threshold to get TPR and FPR values for the desired AUC
for tpr, fpr in zip(tpr_values, fpr_values):
desired_tpr = np.interp(desired_auc, [original_auc, 0], [1, tpr])
desired_fpr = np.interp(desired_tpr, [1, tpr], [0, fpr])
interpolated_tpr.append(desired_tpr)
interpolated_fpr.append(desired_fpr)
return interpolated_tpr, interpolated_fpr
# Interpolate TPR and FPR for the desired AUC at each threshold
desired_tpr_values, desired_fpr_values = interpolate_tpr_fpr_for_auc(desired_auc, tpr_values, fpr_values, original_auc)
# Print the results
print(f"Desired TPR values for AUC {desired_auc:.3f}: {desired_tpr_values}")
print(f"Desired FPR values for AUC {desired_auc:.3f}: {desired_fpr_values}")
The auc score will be between 0 to 1. But the plot is for auc(0.001) and so on.
I tried to use numpy library interpolate function and try to get tpr and fpr at a certain threshold for my desired auc like 0.01, 0.001 and so on. I am not confident enough how the plot is generated and my way of doing as well.
Upvotes: 0
Views: 108
Reputation: 5095
When they say AUC(b=0.001) or similar, they are using what they defined as bounded AUC (see final paragraph on page 7). They obtain it as follows: 1 calculate the AUC as normal. [2] Decide on a low false-positive (FP) threshold, referred to as b. [3] Ignore the ROC curve after FP=b (i.e. the ROC curve is now bounded up to FP=b), and calculate the AUC of this clipped curve. [4] When you report an AUC value, normalise it by dividing by the clipped AUC. Their figure 4 illustrates setting b=0.5, and using only the shaded area of the curve. The reason is that they are especially interested in small FP values, so they limit the ROC to that area.
As measuring the AUC for the full range often is of little expressiveness [...] we use the bounded AUC. That is the area under the ROC curve up to a threshold b of false-positives and normalized to that value: AUC(b). [...] it is particular important to push forward detection with few false-positives.
Update The code below illustrates the idea. You clip the original ROC to some value, and get the area of the clipped curve (yellow). In the paper, when the authors report an AUC, they first divide it by the yellow area to turn it into a percentage.
import numpy as np
#Sample TPR and FPR values. Replace with your data.
tpr_orig = np.array([0, 0.2, 0.4, 0.6, 0.8, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 1])
fpr_orig = np.array([0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.55, 0.6, 0.62, 0.65, 0.7, 0.8, 0.9, 1])
#Choose an FPR threshold. This is "b" in the paper.
b = 0.33
fpr_limited = np.where(fpr_orig < b, fpr_orig, b)[:np.argmin(fpr_orig < b) + 1]
tpr_limited = np.interp(fpr_limited, fpr_values, tpr_values)
#The AUC of the original ROC
print('AUC of original ROC:', np.trapz(tpr_orig, fpr_orig).round(4))
#AUC of the clipped ROC
print(f'AUC(b={b}):', np.trapz(tpr_limited, fpr_limited).round(4))
#Plot original
plt.plot(fpr_orig, tpr_orig, '.-', color='tab:blue', linewidth=5, markersize=18, label='original ROC')
plt.fill_between(x=fpr_orig, y1=tpr_orig, color='tab:blue', alpha=0.3, label='original AUC')
#Plot clipped
plt.plot(fpr_limited, tpr_limited, '.-', color='tab:olive', linewidth=1.5,
markersize=7, label=f'ROC(b={b})')
plt.fill_between(x=fpr_limited, y1=tpr_limited, color='tab:olive',
facecolor='none', hatch='XXXX', label=f'AUC(b={b})')
ax_lims = plt.axis()
plt.vlines(b, -1, tpr_limited[-1], linewidth=1.5, color='tab:olive')
plt.text(b * 1.05, tpr_limited[-1] * 0.7, f'FPR limited to b={b}', color='olive')
plt.gca().axis(ax_lims)
plt.xlabel('FPR')
plt.ylabel('TPR')
plt.title(f'ROC & AUC before and after limiting\nthe FPR to b={b}')
plt.legend()
plt.show()
Upvotes: 0